All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] Add Kernel Concurrency Sanitizer (KCSAN)
@ 2019-10-17 14:12 ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:12 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN).
KCSAN is a sampling watchpoint-based data-race detector. More details
are included in Documentation/dev-tools/kcsan.rst. This patch-series
only enables KCSAN for x86, but we expect adding support for other
architectures is relatively straightforward (we are aware of
experimental ARM64 and POWER support).

To gather early feedback, we announced KCSAN back in September, and
have integrated the feedback where possible:
http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com

We want to point out and acknowledge the work surrounding the LKMM,
including several articles that motivate why data-races are dangerous
[1, 2], justifying a data-race detector such as KCSAN.
[1] https://lwn.net/Articles/793253/
[2] https://lwn.net/Articles/799218/

The current list of known upstream fixes for data-races found by KCSAN
can be found here:
https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan

Changelog
---------
v2:
* Elaborate comment about instrumentation calls emitted by compilers.
* Replace kcsan_check_access(.., {true, false}) with
  kcsan_check_{read,write} for improved readability.
* Introduce __atomic_check_{read,write} in atomic-instrumented.h [Suggested by
  Mark Rutland].
* Change bug title of race of unknown origin to just say "data-race in".
* Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
* Add comment about safety of find_watchpoint without user_access_save.
* Remove unnecessary preempt_disable/enable and elaborate on comment why
  we want to disable interrupts and preemptions.
* Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
  contexts [Suggested by Mark Rutland].
* Document x86 build exceptions where no previous above comment
  explained why we cannot instrument.

v1: http://lkml.kernel.org/r/20191016083959.186860-1-elver@google.com


Marco Elver (8):
  kcsan: Add Kernel Concurrency Sanitizer infrastructure
  objtool, kcsan: Add KCSAN runtime functions to whitelist
  build, kcsan: Add KCSAN build exceptions
  seqlock, kcsan: Add annotations for KCSAN
  seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier
  asm-generic, kcsan: Add KCSAN instrumentation for bitops
  locking/atomics, kcsan: Add KCSAN instrumentation
  x86, kcsan: Enable KCSAN for x86

 Documentation/dev-tools/kcsan.rst         | 203 ++++++++++
 MAINTAINERS                               |  11 +
 Makefile                                  |   3 +-
 arch/x86/Kconfig                          |   1 +
 arch/x86/boot/Makefile                    |   2 +
 arch/x86/boot/compressed/Makefile         |   2 +
 arch/x86/entry/vdso/Makefile              |   3 +
 arch/x86/include/asm/bitops.h             |   6 +-
 arch/x86/kernel/Makefile                  |   7 +
 arch/x86/kernel/cpu/Makefile              |   3 +
 arch/x86/lib/Makefile                     |   4 +
 arch/x86/mm/Makefile                      |   3 +
 arch/x86/purgatory/Makefile               |   2 +
 arch/x86/realmode/Makefile                |   3 +
 arch/x86/realmode/rm/Makefile             |   3 +
 drivers/firmware/efi/libstub/Makefile     |   2 +
 include/asm-generic/atomic-instrumented.h | 393 ++++++++++----------
 include/asm-generic/bitops-instrumented.h |  18 +
 include/linux/compiler-clang.h            |   9 +
 include/linux/compiler-gcc.h              |   7 +
 include/linux/compiler.h                  |  35 +-
 include/linux/kcsan-checks.h              | 147 ++++++++
 include/linux/kcsan.h                     | 108 ++++++
 include/linux/sched.h                     |   4 +
 include/linux/seqlock.h                   |  51 ++-
 init/init_task.c                          |   8 +
 init/main.c                               |   2 +
 kernel/Makefile                           |   6 +
 kernel/kcsan/Makefile                     |  14 +
 kernel/kcsan/atomic.c                     |  21 ++
 kernel/kcsan/core.c                       | 428 ++++++++++++++++++++++
 kernel/kcsan/debugfs.c                    | 225 ++++++++++++
 kernel/kcsan/encoding.h                   |  94 +++++
 kernel/kcsan/kcsan.c                      |  86 +++++
 kernel/kcsan/kcsan.h                      | 140 +++++++
 kernel/kcsan/report.c                     | 306 ++++++++++++++++
 kernel/kcsan/test.c                       | 117 ++++++
 kernel/sched/Makefile                     |   6 +
 lib/Kconfig.debug                         |   2 +
 lib/Kconfig.kcsan                         |  88 +++++
 lib/Makefile                              |   3 +
 mm/Makefile                               |   8 +
 scripts/Makefile.kcsan                    |   6 +
 scripts/Makefile.lib                      |  10 +
 scripts/atomic/gen-atomic-instrumented.sh |  17 +-
 tools/objtool/check.c                     |  17 +
 46 files changed, 2428 insertions(+), 206 deletions(-)
 create mode 100644 Documentation/dev-tools/kcsan.rst
 create mode 100644 include/linux/kcsan-checks.h
 create mode 100644 include/linux/kcsan.h
 create mode 100644 kernel/kcsan/Makefile
 create mode 100644 kernel/kcsan/atomic.c
 create mode 100644 kernel/kcsan/core.c
 create mode 100644 kernel/kcsan/debugfs.c
 create mode 100644 kernel/kcsan/encoding.h
 create mode 100644 kernel/kcsan/kcsan.c
 create mode 100644 kernel/kcsan/kcsan.h
 create mode 100644 kernel/kcsan/report.c
 create mode 100644 kernel/kcsan/test.c
 create mode 100644 lib/Kconfig.kcsan
 create mode 100644 scripts/Makefile.kcsan

-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v2 0/8] Add Kernel Concurrency Sanitizer (KCSAN)
@ 2019-10-17 14:12 ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:12 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

This is the patch-series for the Kernel Concurrency Sanitizer (KCSAN).
KCSAN is a sampling watchpoint-based data-race detector. More details
are included in Documentation/dev-tools/kcsan.rst. This patch-series
only enables KCSAN for x86, but we expect adding support for other
architectures is relatively straightforward (we are aware of
experimental ARM64 and POWER support).

To gather early feedback, we announced KCSAN back in September, and
have integrated the feedback where possible:
http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com

We want to point out and acknowledge the work surrounding the LKMM,
including several articles that motivate why data-races are dangerous
[1, 2], justifying a data-race detector such as KCSAN.
[1] https://lwn.net/Articles/793253/
[2] https://lwn.net/Articles/799218/

The current list of known upstream fixes for data-races found by KCSAN
can be found here:
https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan

Changelog
---------
v2:
* Elaborate comment about instrumentation calls emitted by compilers.
* Replace kcsan_check_access(.., {true, false}) with
  kcsan_check_{read,write} for improved readability.
* Introduce __atomic_check_{read,write} in atomic-instrumented.h [Suggested by
  Mark Rutland].
* Change bug title of race of unknown origin to just say "data-race in".
* Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
* Add comment about safety of find_watchpoint without user_access_save.
* Remove unnecessary preempt_disable/enable and elaborate on comment why
  we want to disable interrupts and preemptions.
* Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
  contexts [Suggested by Mark Rutland].
* Document x86 build exceptions where no previous above comment
  explained why we cannot instrument.

v1: http://lkml.kernel.org/r/20191016083959.186860-1-elver@google.com


Marco Elver (8):
  kcsan: Add Kernel Concurrency Sanitizer infrastructure
  objtool, kcsan: Add KCSAN runtime functions to whitelist
  build, kcsan: Add KCSAN build exceptions
  seqlock, kcsan: Add annotations for KCSAN
  seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier
  asm-generic, kcsan: Add KCSAN instrumentation for bitops
  locking/atomics, kcsan: Add KCSAN instrumentation
  x86, kcsan: Enable KCSAN for x86

 Documentation/dev-tools/kcsan.rst         | 203 ++++++++++
 MAINTAINERS                               |  11 +
 Makefile                                  |   3 +-
 arch/x86/Kconfig                          |   1 +
 arch/x86/boot/Makefile                    |   2 +
 arch/x86/boot/compressed/Makefile         |   2 +
 arch/x86/entry/vdso/Makefile              |   3 +
 arch/x86/include/asm/bitops.h             |   6 +-
 arch/x86/kernel/Makefile                  |   7 +
 arch/x86/kernel/cpu/Makefile              |   3 +
 arch/x86/lib/Makefile                     |   4 +
 arch/x86/mm/Makefile                      |   3 +
 arch/x86/purgatory/Makefile               |   2 +
 arch/x86/realmode/Makefile                |   3 +
 arch/x86/realmode/rm/Makefile             |   3 +
 drivers/firmware/efi/libstub/Makefile     |   2 +
 include/asm-generic/atomic-instrumented.h | 393 ++++++++++----------
 include/asm-generic/bitops-instrumented.h |  18 +
 include/linux/compiler-clang.h            |   9 +
 include/linux/compiler-gcc.h              |   7 +
 include/linux/compiler.h                  |  35 +-
 include/linux/kcsan-checks.h              | 147 ++++++++
 include/linux/kcsan.h                     | 108 ++++++
 include/linux/sched.h                     |   4 +
 include/linux/seqlock.h                   |  51 ++-
 init/init_task.c                          |   8 +
 init/main.c                               |   2 +
 kernel/Makefile                           |   6 +
 kernel/kcsan/Makefile                     |  14 +
 kernel/kcsan/atomic.c                     |  21 ++
 kernel/kcsan/core.c                       | 428 ++++++++++++++++++++++
 kernel/kcsan/debugfs.c                    | 225 ++++++++++++
 kernel/kcsan/encoding.h                   |  94 +++++
 kernel/kcsan/kcsan.c                      |  86 +++++
 kernel/kcsan/kcsan.h                      | 140 +++++++
 kernel/kcsan/report.c                     | 306 ++++++++++++++++
 kernel/kcsan/test.c                       | 117 ++++++
 kernel/sched/Makefile                     |   6 +
 lib/Kconfig.debug                         |   2 +
 lib/Kconfig.kcsan                         |  88 +++++
 lib/Makefile                              |   3 +
 mm/Makefile                               |   8 +
 scripts/Makefile.kcsan                    |   6 +
 scripts/Makefile.lib                      |  10 +
 scripts/atomic/gen-atomic-instrumented.sh |  17 +-
 tools/objtool/check.c                     |  17 +
 46 files changed, 2428 insertions(+), 206 deletions(-)
 create mode 100644 Documentation/dev-tools/kcsan.rst
 create mode 100644 include/linux/kcsan-checks.h
 create mode 100644 include/linux/kcsan.h
 create mode 100644 kernel/kcsan/Makefile
 create mode 100644 kernel/kcsan/atomic.c
 create mode 100644 kernel/kcsan/core.c
 create mode 100644 kernel/kcsan/debugfs.c
 create mode 100644 kernel/kcsan/encoding.h
 create mode 100644 kernel/kcsan/kcsan.c
 create mode 100644 kernel/kcsan/kcsan.h
 create mode 100644 kernel/kcsan/report.c
 create mode 100644 kernel/kcsan/test.c
 create mode 100644 lib/Kconfig.kcsan
 create mode 100644 scripts/Makefile.kcsan

-- 
2.23.0.866.gb869b98d4c-goog



^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-17 14:12 ` Marco Elver
@ 2019-10-17 14:12   ` Marco Elver
  -1 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:12 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
kernel space. KCSAN is a sampling watchpoint-based data-race detector.
See the included Documentation/dev-tools/kcsan.rst for more details.

This patch adds basic infrastructure, but does not yet enable KCSAN for
any architecture.

Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* Elaborate comment about instrumentation calls emitted by compilers.
* Replace kcsan_check_access(.., {true, false}) with
  kcsan_check_{read,write} for improved readability.
* Change bug title of race of unknown origin to just say "data-race in".
* Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
* Add comment about safety of find_watchpoint without user_access_save.
* Remove unnecessary preempt_disable/enable and elaborate on comment why
  we want to disable interrupts and preemptions.
* Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
  contexts [Suggested by Mark Rutland].
---
 Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
 MAINTAINERS                       |  11 +
 Makefile                          |   3 +-
 include/linux/compiler-clang.h    |   9 +
 include/linux/compiler-gcc.h      |   7 +
 include/linux/compiler.h          |  35 ++-
 include/linux/kcsan-checks.h      | 147 ++++++++++
 include/linux/kcsan.h             | 108 ++++++++
 include/linux/sched.h             |   4 +
 init/init_task.c                  |   8 +
 init/main.c                       |   2 +
 kernel/Makefile                   |   1 +
 kernel/kcsan/Makefile             |  14 +
 kernel/kcsan/atomic.c             |  21 ++
 kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
 kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
 kernel/kcsan/encoding.h           |  94 +++++++
 kernel/kcsan/kcsan.c              |  86 ++++++
 kernel/kcsan/kcsan.h              | 140 ++++++++++
 kernel/kcsan/report.c             | 306 +++++++++++++++++++++
 kernel/kcsan/test.c               | 117 ++++++++
 lib/Kconfig.debug                 |   2 +
 lib/Kconfig.kcsan                 |  88 ++++++
 lib/Makefile                      |   3 +
 scripts/Makefile.kcsan            |   6 +
 scripts/Makefile.lib              |  10 +
 26 files changed, 2069 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/dev-tools/kcsan.rst
 create mode 100644 include/linux/kcsan-checks.h
 create mode 100644 include/linux/kcsan.h
 create mode 100644 kernel/kcsan/Makefile
 create mode 100644 kernel/kcsan/atomic.c
 create mode 100644 kernel/kcsan/core.c
 create mode 100644 kernel/kcsan/debugfs.c
 create mode 100644 kernel/kcsan/encoding.h
 create mode 100644 kernel/kcsan/kcsan.c
 create mode 100644 kernel/kcsan/kcsan.h
 create mode 100644 kernel/kcsan/report.c
 create mode 100644 kernel/kcsan/test.c
 create mode 100644 lib/Kconfig.kcsan
 create mode 100644 scripts/Makefile.kcsan

diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
new file mode 100644
index 000000000000..497b09e5cc96
--- /dev/null
+++ b/Documentation/dev-tools/kcsan.rst
@@ -0,0 +1,203 @@
+The Kernel Concurrency Sanitizer (KCSAN)
+========================================
+
+Overview
+--------
+
+*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
+kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
+is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
+detector. Key priorities in KCSAN's design are lack of false positives,
+scalability, and simplicity. More details can be found in `Implementation
+Details`_.
+
+KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
+supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
+With Clang it requires version 7.0.0 or later.
+
+Usage
+-----
+
+To enable KCSAN configure kernel with::
+
+    CONFIG_KCSAN = y
+
+KCSAN provides several other configuration options to customize behaviour (see
+their respective help text for more info).
+
+debugfs
+~~~~~~~
+
+* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
+
+* KCSAN can be turned on or off by writing ``on`` or ``off`` to
+  ``/sys/kernel/debug/kcsan``.
+
+* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
+  ``some_func_name`` to the report filter list, which (by default) blacklists
+  reporting data-races where either one of the top stackframes are a function
+  in the list.
+
+* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
+  changes the report filtering behaviour. For example, the blacklist feature
+  can be used to silence frequently occurring data-races; the whitelist feature
+  can help with reproduction and testing of fixes.
+
+Error reports
+~~~~~~~~~~~~~
+
+A typical data-race report looks like this::
+
+    ==================================================================
+    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
+
+    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
+     kernfs_refresh_inode+0x70/0x170
+     kernfs_iop_permission+0x4f/0x90
+     inode_permission+0x190/0x200
+     link_path_walk.part.0+0x503/0x8e0
+     path_lookupat.isra.0+0x69/0x4d0
+     filename_lookup+0x136/0x280
+     user_path_at_empty+0x47/0x60
+     vfs_statx+0x9b/0x130
+     __do_sys_newlstat+0x50/0xb0
+     __x64_sys_newlstat+0x37/0x50
+     do_syscall_64+0x85/0x260
+     entry_SYSCALL_64_after_hwframe+0x44/0xa9
+
+    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
+     generic_permission+0x5b/0x2a0
+     kernfs_iop_permission+0x66/0x90
+     inode_permission+0x190/0x200
+     link_path_walk.part.0+0x503/0x8e0
+     path_lookupat.isra.0+0x69/0x4d0
+     filename_lookup+0x136/0x280
+     user_path_at_empty+0x47/0x60
+     do_faccessat+0x11a/0x390
+     __x64_sys_access+0x3c/0x50
+     do_syscall_64+0x85/0x260
+     entry_SYSCALL_64_after_hwframe+0x44/0xa9
+
+    Reported by Kernel Concurrency Sanitizer on:
+    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
+    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
+    ==================================================================
+
+The header of the report provides a short summary of the functions involved in
+the race. It is followed by the access types and stack traces of the 2 threads
+involved in the data-race.
+
+The other less common type of data-race report looks like this::
+
+    ==================================================================
+    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
+
+    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
+     e1000_clean_rx_irq+0x551/0xb10
+     e1000_clean+0x533/0xda0
+     net_rx_action+0x329/0x900
+     __do_softirq+0xdb/0x2db
+     irq_exit+0x9b/0xa0
+     do_IRQ+0x9c/0xf0
+     ret_from_intr+0x0/0x18
+     default_idle+0x3f/0x220
+     arch_cpu_idle+0x21/0x30
+     do_idle+0x1df/0x230
+     cpu_startup_entry+0x14/0x20
+     rest_init+0xc5/0xcb
+     arch_call_rest_init+0x13/0x2b
+     start_kernel+0x6db/0x700
+
+    Reported by Kernel Concurrency Sanitizer on:
+    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
+    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
+    ==================================================================
+
+This report is generated where it was not possible to determine the other
+racing thread, but a race was inferred due to the data-value of the watched
+memory location having changed. These can occur either due to missing
+instrumentation or e.g. DMA accesses.
+
+Data-Races
+----------
+
+Informally, two operations *conflict* if they access the same memory location,
+and at least one of them is a write operation. In an execution, two memory
+operations from different threads form a **data-race** if they *conflict*, at
+least one of them is a *plain access* (non-atomic), and they are *unordered* in
+the "happens-before" order according to the `LKMM
+<../../tools/memory-model/Documentation/explanation.txt>`_.
+
+Relationship with the Linux Kernel Memory Model (LKMM)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The LKMM defines the propagation and ordering rules of various memory
+operations, which gives developers the ability to reason about concurrent code.
+Ultimately this allows to determine the possible executions of concurrent code,
+and if that code is free from data-races.
+
+KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
+``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
+words, KCSAN assumes that as long as a plain access is not observed to race
+with another conflicting access, memory operations are correctly ordered.
+
+This means that KCSAN will not report *potential* data-races due to missing
+memory ordering. If, however, missing memory ordering (that is observable with
+a particular compiler and architecture) leads to an observable data-race (e.g.
+entering a critical section erroneously), KCSAN would report the resulting
+data-race.
+
+Implementation Details
+----------------------
+
+The general approach is inspired by `DataCollider
+<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
+Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
+relies on compiler instrumentation. Watchpoints are implemented using an
+efficient encoding that stores access type, size, and address in a long; the
+benefits of using "soft watchpoints" are portability and greater flexibility in
+limiting which accesses trigger a watchpoint.
+
+More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
+memory operations; for each instrumented plain access:
+
+1. Check if a matching watchpoint exists; if yes, and at least one access is a
+   write, then we encountered a racing access.
+
+2. Periodically, if no matching watchpoint exists, set up a watchpoint and
+   stall some delay.
+
+3. Also check the data value before the delay, and re-check the data value
+   after delay; if the values mismatch, we infer a race of unknown origin.
+
+To detect data-races between plain and atomic memory operations, KCSAN also
+annotates atomic accesses, but only to check if a watchpoint exists
+(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
+accesses.
+
+Key Properties
+~~~~~~~~~~~~~~
+
+1. **Memory Overhead:** No shadow memory is required. The current
+   implementation uses a small array of longs to encode watchpoint information,
+   which is negligible.
+
+2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
+   efficient watchpoint encoding that does not require acquiring any shared
+   locks in the fast-path. For kernel boot with a default config on a system
+   where nproc=8 we measure a slow-down of 10-15x.
+
+3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
+   may result in missed data-races (false negatives), compared to a
+   happens-before data-race detector.
+
+4. **Accuracy:** Imprecise, since it uses a sampling strategy.
+
+5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
+   runtime. With a happens-before data-race detector, any omission leads to
+   false positives, which is especially important in the context of the kernel
+   which includes numerous custom synchronization mechanisms. With KCSAN, as a
+   result, maintenance overheads are minimal as the kernel evolves.
+
+6. **Detects Racy Writes from Devices:** Due to checking data values upon
+   setting up watchpoints, racy writes from devices can also be detected.
diff --git a/MAINTAINERS b/MAINTAINERS
index 0154674cbad3..71f7fb625490 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8847,6 +8847,17 @@ F:	Documentation/kbuild/kconfig*
 F:	scripts/kconfig/
 F:	scripts/Kconfig.include
 
+KCSAN
+M:	Marco Elver <elver@google.com>
+R:	Dmitry Vyukov <dvyukov@google.com>
+L:	kasan-dev@googlegroups.com
+S:	Maintained
+F:	Documentation/dev-tools/kcsan.rst
+F:	include/linux/kcsan*.h
+F:	kernel/kcsan/
+F:	lib/Kconfig.kcsan
+F:	scripts/Makefile.kcsan
+
 KDUMP
 M:	Dave Young <dyoung@redhat.com>
 M:	Baoquan He <bhe@redhat.com>
diff --git a/Makefile b/Makefile
index ffd7a912fc46..ad4729176252 100644
--- a/Makefile
+++ b/Makefile
@@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
 
 export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
 export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
-export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
+export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
 export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
 export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
 export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
@@ -900,6 +900,7 @@ endif
 include scripts/Makefile.kasan
 include scripts/Makefile.extrawarn
 include scripts/Makefile.ubsan
+include scripts/Makefile.kcsan
 
 # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
 KBUILD_CPPFLAGS += $(KCPPFLAGS)
diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index 333a6695a918..a213eb55e725 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -24,6 +24,15 @@
 #define __no_sanitize_address
 #endif
 
+#if __has_feature(thread_sanitizer)
+/* emulate gcc's __SANITIZE_THREAD__ flag */
+#define __SANITIZE_THREAD__
+#define __no_sanitize_thread \
+		__attribute__((no_sanitize("thread")))
+#else
+#define __no_sanitize_thread
+#endif
+
 /*
  * Not all versions of clang implement the the type-generic versions
  * of the builtin overflow checkers. Fortunately, clang implements
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index d7ee4c6bad48..de105ca29282 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -145,6 +145,13 @@
 #define __no_sanitize_address
 #endif
 
+#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
+#define __no_sanitize_thread                                                   \
+	__attribute__((__noinline__)) __attribute__((no_sanitize_thread))
+#else
+#define __no_sanitize_thread
+#endif
+
 #if GCC_VERSION >= 50100
 #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
 #endif
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 5e88e7e33abe..350d80dbee4d 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
 #endif
 
 #include <uapi/linux/types.h>
+#include <linux/kcsan-checks.h>
 
 #define __READ_ONCE_SIZE						\
 ({									\
@@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
 	}								\
 })
 
-static __always_inline
-void __read_once_size(const volatile void *p, void *res, int size)
-{
-	__READ_ONCE_SIZE;
-}
-
 #ifdef CONFIG_KASAN
 /*
  * We can't declare function 'inline' because __no_sanitize_address confilcts
@@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
 # define __no_kasan_or_inline __always_inline
 #endif
 
-static __no_kasan_or_inline
+#ifdef CONFIG_KCSAN
+# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
+#else
+# define __no_kcsan_or_inline __always_inline
+#endif
+
+#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
+/* Avoid any instrumentation or inline. */
+#define __no_sanitize_or_inline                                                \
+	__no_sanitize_address __no_sanitize_thread notrace __maybe_unused
+#else
+#define __no_sanitize_or_inline __always_inline
+#endif
+
+static __no_kcsan_or_inline
+void __read_once_size(const volatile void *p, void *res, int size)
+{
+	kcsan_check_atomic_read((const void *)p, size);
+	__READ_ONCE_SIZE;
+}
+
+static __no_sanitize_or_inline
 void __read_once_size_nocheck(const volatile void *p, void *res, int size)
 {
 	__READ_ONCE_SIZE;
 }
 
-static __always_inline void __write_once_size(volatile void *p, void *res, int size)
+static __no_kcsan_or_inline
+void __write_once_size(volatile void *p, void *res, int size)
 {
+	kcsan_check_atomic_write((const void *)p, size);
+
 	switch (size) {
 	case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
 	case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
new file mode 100644
index 000000000000..4203603ae852
--- /dev/null
+++ b/include/linux/kcsan-checks.h
@@ -0,0 +1,147 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _LINUX_KCSAN_CHECKS_H
+#define _LINUX_KCSAN_CHECKS_H
+
+#include <linux/types.h>
+
+/*
+ * __kcsan_*: Always available when KCSAN is enabled. This may be used
+ * even in compilation units that selectively disable KCSAN, but must use KCSAN
+ * to validate access to an address.   Never use these in header files!
+ */
+#ifdef CONFIG_KCSAN
+/**
+ * __kcsan_check_watchpoint - check if a watchpoint exists
+ *
+ * Returns true if no race was detected, and we may then proceed to set up a
+ * watchpoint after. Returns false if either KCSAN is disabled or a race was
+ * encountered, and we may not set up a watchpoint after.
+ *
+ * @ptr address of access
+ * @size size of access
+ * @is_write is access a write
+ * @return true if no race was detected, false otherwise.
+ */
+bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
+			      bool is_write);
+
+/**
+ * __kcsan_setup_watchpoint - set up watchpoint and report data-races
+ *
+ * Sets up a watchpoint (if sampled), and if a racing access was observed,
+ * reports the data-race.
+ *
+ * @ptr address of access
+ * @size size of access
+ * @is_write is access a write
+ */
+void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
+			      bool is_write);
+#else
+static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
+					    size_t size, bool is_write)
+{
+	return true;
+}
+static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
+					    size_t size, bool is_write)
+{
+}
+#endif
+
+/*
+ * kcsan_*: Only available when the particular compilation unit has KCSAN
+ * instrumentation enabled. May be used in header files.
+ */
+#ifdef __SANITIZE_THREAD__
+#define kcsan_check_watchpoint __kcsan_check_watchpoint
+#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
+#else
+static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
+					  bool is_write)
+{
+	return true;
+}
+static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
+					  bool is_write)
+{
+}
+#endif
+
+/**
+ * __kcsan_check_read - check regular read access for data-races
+ *
+ * Full read access that checks watchpoint and sets up a watchpoint if this
+ * access is sampled. Note that, setting up watchpoints for plain reads is
+ * required to also detect data-races with atomic accesses.
+ *
+ * @ptr address of access
+ * @size size of access
+ */
+#define __kcsan_check_read(ptr, size)                                          \
+	do {                                                                   \
+		if (__kcsan_check_watchpoint(ptr, size, false))                \
+			__kcsan_setup_watchpoint(ptr, size, false);            \
+	} while (0)
+
+/**
+ * __kcsan_check_write - check regular write access for data-races
+ *
+ * Full write access that checks watchpoint and sets up a watchpoint if this
+ * access is sampled.
+ *
+ * @ptr address of access
+ * @size size of access
+ */
+#define __kcsan_check_write(ptr, size)                                         \
+	do {                                                                   \
+		if (__kcsan_check_watchpoint(ptr, size, true) &&               \
+		    !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
+			__kcsan_setup_watchpoint(ptr, size, true);             \
+	} while (0)
+
+/**
+ * kcsan_check_read - check regular read access for data-races
+ *
+ * @ptr address of access
+ * @size size of access
+ */
+#define kcsan_check_read(ptr, size)                                            \
+	do {                                                                   \
+		if (kcsan_check_watchpoint(ptr, size, false))                  \
+			kcsan_setup_watchpoint(ptr, size, false);              \
+	} while (0)
+
+/**
+ * kcsan_check_write - check regular write access for data-races
+ *
+ * @ptr address of access
+ * @size size of access
+ */
+#define kcsan_check_write(ptr, size)                                           \
+	do {                                                                   \
+		if (kcsan_check_watchpoint(ptr, size, true) &&                 \
+		    !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
+			kcsan_setup_watchpoint(ptr, size, true);               \
+	} while (0)
+
+/*
+ * Check for atomic accesses: if atomic access are not ignored, this simply
+ * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
+ */
+#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
+#define kcsan_check_atomic_read(...)                                           \
+	do {                                                                   \
+	} while (0)
+#define kcsan_check_atomic_write(...)                                          \
+	do {                                                                   \
+	} while (0)
+#else
+#define kcsan_check_atomic_read(ptr, size)                                     \
+	kcsan_check_watchpoint(ptr, size, false)
+#define kcsan_check_atomic_write(ptr, size)                                    \
+	kcsan_check_watchpoint(ptr, size, true)
+#endif
+
+#endif /* _LINUX_KCSAN_CHECKS_H */
diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
new file mode 100644
index 000000000000..fd5de2ba3a16
--- /dev/null
+++ b/include/linux/kcsan.h
@@ -0,0 +1,108 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _LINUX_KCSAN_H
+#define _LINUX_KCSAN_H
+
+#include <linux/types.h>
+#include <linux/kcsan-checks.h>
+
+#ifdef CONFIG_KCSAN
+
+/*
+ * Context for each thread of execution: for tasks, this is stored in
+ * task_struct, and interrupts access internal per-CPU storage.
+ */
+struct kcsan_ctx {
+	int disable; /* disable counter */
+	int atomic_next; /* number of following atomic ops */
+
+	/*
+	 * We use separate variables to store if we are in a nestable or flat
+	 * atomic region. This helps make sure that an atomic region with
+	 * nesting support is not suddenly aborted when a flat region is
+	 * contained within. Effectively this allows supporting nesting flat
+	 * atomic regions within an outer nestable atomic region. Support for
+	 * this is required as there are cases where a seqlock reader critical
+	 * section (flat atomic region) is contained within a seqlock writer
+	 * critical section (nestable atomic region), and the "mismatching
+	 * kcsan_end_atomic()" warning would trigger otherwise.
+	 */
+	int atomic_region;
+	bool atomic_region_flat;
+};
+
+/**
+ * kcsan_init - initialize KCSAN runtime
+ */
+void kcsan_init(void);
+
+/**
+ * kcsan_disable_current - disable KCSAN for the current context
+ *
+ * Supports nesting.
+ */
+void kcsan_disable_current(void);
+
+/**
+ * kcsan_enable_current - re-enable KCSAN for the current context
+ *
+ * Supports nesting.
+ */
+void kcsan_enable_current(void);
+
+/**
+ * kcsan_begin_atomic - use to denote an atomic region
+ *
+ * Accesses within the atomic region may appear to race with other accesses but
+ * should be considered atomic.
+ *
+ * @nest true if regions may be nested, or false for flat region
+ */
+void kcsan_begin_atomic(bool nest);
+
+/**
+ * kcsan_end_atomic - end atomic region
+ *
+ * @nest must match argument to kcsan_begin_atomic().
+ */
+void kcsan_end_atomic(bool nest);
+
+/**
+ * kcsan_atomic_next - consider following accesses as atomic
+ *
+ * Force treating the next n memory accesses for the current context as atomic
+ * operations.
+ *
+ * @n number of following memory accesses to treat as atomic.
+ */
+void kcsan_atomic_next(int n);
+
+#else /* CONFIG_KCSAN */
+
+static inline void kcsan_init(void)
+{
+}
+
+static inline void kcsan_disable_current(void)
+{
+}
+
+static inline void kcsan_enable_current(void)
+{
+}
+
+static inline void kcsan_begin_atomic(bool nest)
+{
+}
+
+static inline void kcsan_end_atomic(bool nest)
+{
+}
+
+static inline void kcsan_atomic_next(int n)
+{
+}
+
+#endif /* CONFIG_KCSAN */
+
+#endif /* _LINUX_KCSAN_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2c2e56bd8913..9490e417bf4a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -31,6 +31,7 @@
 #include <linux/task_io_accounting.h>
 #include <linux/posix-timers.h>
 #include <linux/rseq.h>
+#include <linux/kcsan.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
 struct audit_context;
@@ -1171,6 +1172,9 @@ struct task_struct {
 #ifdef CONFIG_KASAN
 	unsigned int			kasan_depth;
 #endif
+#ifdef CONFIG_KCSAN
+	struct kcsan_ctx		kcsan_ctx;
+#endif
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	/* Index of current stored address in ret_stack: */
diff --git a/init/init_task.c b/init/init_task.c
index 9e5cbe5eab7b..e229416c3314 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -161,6 +161,14 @@ struct task_struct init_task
 #ifdef CONFIG_KASAN
 	.kasan_depth	= 1,
 #endif
+#ifdef CONFIG_KCSAN
+	.kcsan_ctx = {
+		.disable		= 1,
+		.atomic_next		= 0,
+		.atomic_region		= 0,
+		.atomic_region_flat	= 0,
+	},
+#endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	.softirqs_enabled = 1,
 #endif
diff --git a/init/main.c b/init/main.c
index 91f6ebb30ef0..4d814de017ee 100644
--- a/init/main.c
+++ b/init/main.c
@@ -93,6 +93,7 @@
 #include <linux/rodata_test.h>
 #include <linux/jump_label.h>
 #include <linux/mem_encrypt.h>
+#include <linux/kcsan.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
 	acpi_subsystem_init();
 	arch_post_acpi_subsys_init();
 	sfi_init_late();
+	kcsan_init();
 
 	/* Do the rest non-__init'ed, we're now alive */
 	arch_call_rest_init();
diff --git a/kernel/Makefile b/kernel/Makefile
index daad787fb795..74ab46e2ebd1 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
 obj-$(CONFIG_IRQ_WORK) += irq_work.o
 obj-$(CONFIG_CPU_PM) += cpu_pm.o
 obj-$(CONFIG_BPF) += bpf/
+obj-$(CONFIG_KCSAN) += kcsan/
 
 obj-$(CONFIG_PERF_EVENTS) += events/
 
diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
new file mode 100644
index 000000000000..c25f07062d26
--- /dev/null
+++ b/kernel/kcsan/Makefile
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: GPL-2.0
+KCSAN_SANITIZE := n
+KCOV_INSTRUMENT := n
+
+CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
+
+CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
+CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
+CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
+
+obj-y := kcsan.o core.o atomic.o debugfs.o report.o
+obj-$(CONFIG_KCSAN_SELFTEST) += test.o
diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
new file mode 100644
index 000000000000..dd44f7d9e491
--- /dev/null
+++ b/kernel/kcsan/atomic.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/jiffies.h>
+
+#include "kcsan.h"
+
+/*
+ * List all volatile globals that have been observed in races, to suppress
+ * data-race reports between accesses to these variables.
+ *
+ * For now, we assume that volatile accesses of globals are as strong as atomic
+ * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
+ * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
+ * than cast to volatile. Eventually, we hope to be able to remove this
+ * function.
+ */
+bool kcsan_is_atomic(const volatile void *ptr)
+{
+	/* only jiffies for now */
+	return ptr == &jiffies;
+}
diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
new file mode 100644
index 000000000000..bc8d60b129eb
--- /dev/null
+++ b/kernel/kcsan/core.c
@@ -0,0 +1,428 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/atomic.h>
+#include <linux/bug.h>
+#include <linux/delay.h>
+#include <linux/export.h>
+#include <linux/init.h>
+#include <linux/percpu.h>
+#include <linux/preempt.h>
+#include <linux/random.h>
+#include <linux/sched.h>
+#include <linux/uaccess.h>
+
+#include "kcsan.h"
+#include "encoding.h"
+
+/*
+ * Helper macros to iterate slots, starting from address slot itself, followed
+ * by the right and left slots.
+ */
+#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
+#define SLOT_IDX(slot, i)                                                      \
+	((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
+		  KCSAN_CHECK_ADJACENT)) %                                     \
+	 KCSAN_NUM_WATCHPOINTS)
+
+bool kcsan_enabled;
+
+/* Per-CPU kcsan_ctx for interrupts */
+static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
+	.disable = 0,
+	.atomic_next = 0,
+	.atomic_region = 0,
+	.atomic_region_flat = 0,
+};
+
+/*
+ * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
+ * able to safely update and access a watchpoint without introducing locking
+ * overhead, we encode each watchpoint as a single atomic long. The initial
+ * zero-initialized state matches INVALID_WATCHPOINT.
+ */
+static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
+
+/*
+ * Instructions skipped counter; see should_watch().
+ */
+static DEFINE_PER_CPU(unsigned long, kcsan_skip);
+
+static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
+					     bool expect_write,
+					     long *encoded_watchpoint)
+{
+	const int slot = watchpoint_slot(addr);
+	const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
+	atomic_long_t *watchpoint;
+	unsigned long wp_addr_masked;
+	size_t wp_size;
+	bool is_write;
+	int i;
+
+	for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
+		watchpoint = &watchpoints[SLOT_IDX(slot, i)];
+		*encoded_watchpoint = atomic_long_read(watchpoint);
+		if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
+				       &wp_size, &is_write))
+			continue;
+
+		if (expect_write && !is_write)
+			continue;
+
+		/* Check if the watchpoint matches the access. */
+		if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
+			return watchpoint;
+	}
+
+	return NULL;
+}
+
+static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
+					       bool is_write)
+{
+	const int slot = watchpoint_slot(addr);
+	const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
+	atomic_long_t *watchpoint;
+	int i;
+
+	for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
+		long expect_val = INVALID_WATCHPOINT;
+
+		/* Try to acquire this slot. */
+		watchpoint = &watchpoints[SLOT_IDX(slot, i)];
+		if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
+						    encoded_watchpoint))
+			return watchpoint;
+	}
+
+	return NULL;
+}
+
+/*
+ * Return true if watchpoint was successfully consumed, false otherwise.
+ *
+ * This may return false if:
+ *
+ *	1. another thread already consumed the watchpoint;
+ *	2. the thread that set up the watchpoint already removed it;
+ *	3. the watchpoint was removed and then re-used.
+ */
+static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
+					  long encoded_watchpoint)
+{
+	return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
+					       CONSUMED_WATCHPOINT);
+}
+
+/*
+ * Return true if watchpoint was not touched, false if consumed.
+ */
+static inline bool remove_watchpoint(atomic_long_t *watchpoint)
+{
+	return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
+	       CONSUMED_WATCHPOINT;
+}
+
+static inline struct kcsan_ctx *get_ctx(void)
+{
+	/*
+	 * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
+	 * also result in calls that generate warnings in uaccess regions.
+	 */
+	return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
+}
+
+static inline bool is_atomic(const volatile void *ptr)
+{
+	struct kcsan_ctx *ctx = get_ctx();
+
+	if (unlikely(ctx->atomic_next > 0)) {
+		--ctx->atomic_next;
+		return true;
+	}
+	if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
+		return true;
+
+	return kcsan_is_atomic(ptr);
+}
+
+static inline bool should_watch(const volatile void *ptr)
+{
+	/*
+	 * Never set up watchpoints when memory operations are atomic.
+	 *
+	 * We need to check this first, because: 1) atomics should not count
+	 * towards skipped instructions below, and 2) to actually decrement
+	 * kcsan_atomic_next for each atomic.
+	 */
+	if (is_atomic(ptr))
+		return false;
+
+	/*
+	 * We use a per-CPU counter, to avoid excessive contention; there is
+	 * still enough non-determinism for the precise instructions that end up
+	 * being watched to be mostly unpredictable. Using a PRNG like
+	 * prandom_u32() turned out to be too slow.
+	 */
+	return (this_cpu_inc_return(kcsan_skip) %
+		CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
+}
+
+static inline bool is_enabled(void)
+{
+	return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
+}
+
+static inline unsigned int get_delay(void)
+{
+	unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
+					     CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
+	return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
+		       ((prandom_u32() % max_delay) + 1) :
+		       max_delay;
+}
+
+/* === Public interface ===================================================== */
+
+void __init kcsan_init(void)
+{
+	BUG_ON(!in_task());
+
+	kcsan_debugfs_init();
+	kcsan_enable_current();
+#ifdef CONFIG_KCSAN_EARLY_ENABLE
+	/*
+	 * We are in the init task, and no other tasks should be running.
+	 */
+	WRITE_ONCE(kcsan_enabled, true);
+#endif
+}
+
+/* === Exported interface =================================================== */
+
+void kcsan_disable_current(void)
+{
+	++get_ctx()->disable;
+}
+EXPORT_SYMBOL(kcsan_disable_current);
+
+void kcsan_enable_current(void)
+{
+	if (get_ctx()->disable-- == 0) {
+		kcsan_disable_current(); /* restore to 0 */
+		kcsan_disable_current();
+		WARN(1, "mismatching %s", __func__);
+		kcsan_enable_current();
+	}
+}
+EXPORT_SYMBOL(kcsan_enable_current);
+
+void kcsan_begin_atomic(bool nest)
+{
+	if (nest)
+		++get_ctx()->atomic_region;
+	else
+		get_ctx()->atomic_region_flat = true;
+}
+EXPORT_SYMBOL(kcsan_begin_atomic);
+
+void kcsan_end_atomic(bool nest)
+{
+	if (nest) {
+		if (get_ctx()->atomic_region-- == 0) {
+			kcsan_begin_atomic(true); /* restore to 0 */
+			kcsan_disable_current();
+			WARN(1, "mismatching %s", __func__);
+			kcsan_enable_current();
+		}
+	} else {
+		get_ctx()->atomic_region_flat = false;
+	}
+}
+EXPORT_SYMBOL(kcsan_end_atomic);
+
+void kcsan_atomic_next(int n)
+{
+	get_ctx()->atomic_next = n;
+}
+EXPORT_SYMBOL(kcsan_atomic_next);
+
+bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
+			      bool is_write)
+{
+	atomic_long_t *watchpoint;
+	long encoded_watchpoint;
+	unsigned long flags;
+	enum kcsan_report_type report_type;
+
+	if (unlikely(!is_enabled()))
+		return false;
+
+	/*
+	 * Avoid user_access_save in fast-path here: find_watchpoint is safe
+	 * without user_access_save, as the address that ptr points to is only
+	 * used to check if a watchpoint exists; ptr is never dereferenced.
+	 */
+	watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
+				     &encoded_watchpoint);
+	if (watchpoint == NULL)
+		return true;
+
+	flags = user_access_save();
+	if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
+		/*
+		 * The other thread may not print any diagnostics, as it has
+		 * already removed the watchpoint, or another thread consumed
+		 * the watchpoint before this thread.
+		 */
+		kcsan_counter_inc(kcsan_counter_report_races);
+		report_type = kcsan_report_race_check_race;
+	} else {
+		report_type = kcsan_report_race_check;
+	}
+
+	/* Encountered a data-race. */
+	kcsan_counter_inc(kcsan_counter_data_races);
+	kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
+
+	user_access_restore(flags);
+	return false;
+}
+EXPORT_SYMBOL(__kcsan_check_watchpoint);
+
+void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
+			      bool is_write)
+{
+	atomic_long_t *watchpoint;
+	union {
+		u8 _1;
+		u16 _2;
+		u32 _4;
+		u64 _8;
+	} expect_value;
+	bool is_expected = true;
+	unsigned long ua_flags = user_access_save();
+	unsigned long irq_flags;
+
+	if (!should_watch(ptr))
+		goto out;
+
+	if (!check_encodable((unsigned long)ptr, size)) {
+		kcsan_counter_inc(kcsan_counter_unencodable_accesses);
+		goto out;
+	}
+
+	/*
+	 * Disable interrupts & preemptions to avoid another thread on the same
+	 * CPU accessing memory locations for the set up watchpoint; this is to
+	 * avoid reporting races to e.g. CPU-local data.
+	 *
+	 * An alternative would be adding the source CPU to the watchpoint
+	 * encoding, and checking that watchpoint-CPU != this-CPU. There are
+	 * several problems with this:
+	 *   1. we should avoid stealing more bits from the watchpoint encoding
+	 *      as it would affect accuracy, as well as increase performance
+	 *      overhead in the fast-path;
+	 *   2. if we are preempted, but there *is* a genuine data-race, we
+	 *      would *not* report it -- since this is the common case (vs.
+	 *      CPU-local data accesses), it makes more sense (from a data-race
+	 *      detection PoV) to simply disable preemptions to ensure as many
+	 *      tasks as possible run on other CPUs.
+	 */
+	local_irq_save(irq_flags);
+
+	watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
+	if (watchpoint == NULL) {
+		/*
+		 * Out of capacity: the size of `watchpoints`, and the frequency
+		 * with which `should_watch()` returns true should be tweaked so
+		 * that this case happens very rarely.
+		 */
+		kcsan_counter_inc(kcsan_counter_no_capacity);
+		goto out_unlock;
+	}
+
+	kcsan_counter_inc(kcsan_counter_setup_watchpoints);
+	kcsan_counter_inc(kcsan_counter_used_watchpoints);
+
+	/*
+	 * Read the current value, to later check and infer a race if the data
+	 * was modified via a non-instrumented access, e.g. from a device.
+	 */
+	switch (size) {
+	case 1:
+		expect_value._1 = READ_ONCE(*(const u8 *)ptr);
+		break;
+	case 2:
+		expect_value._2 = READ_ONCE(*(const u16 *)ptr);
+		break;
+	case 4:
+		expect_value._4 = READ_ONCE(*(const u32 *)ptr);
+		break;
+	case 8:
+		expect_value._8 = READ_ONCE(*(const u64 *)ptr);
+		break;
+	default:
+		break; /* ignore; we do not diff the values */
+	}
+
+#ifdef CONFIG_KCSAN_DEBUG
+	kcsan_disable_current();
+	pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
+	       is_write ? "write" : "read", size, ptr,
+	       watchpoint_slot((unsigned long)ptr),
+	       encode_watchpoint((unsigned long)ptr, size, is_write));
+	kcsan_enable_current();
+#endif
+
+	/*
+	 * Delay this thread, to increase probability of observing a racy
+	 * conflicting access.
+	 */
+	udelay(get_delay());
+
+	/*
+	 * Re-read value, and check if it is as expected; if not, we infer a
+	 * racy access.
+	 */
+	switch (size) {
+	case 1:
+		is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
+		break;
+	case 2:
+		is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
+		break;
+	case 4:
+		is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
+		break;
+	case 8:
+		is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
+		break;
+	default:
+		break; /* ignore; we do not diff the values */
+	}
+
+	/* Check if this access raced with another. */
+	if (!remove_watchpoint(watchpoint)) {
+		/*
+		 * No need to increment 'race' counter, as the racing thread
+		 * already did.
+		 */
+		kcsan_report(ptr, size, is_write, smp_processor_id(),
+			     kcsan_report_race_setup);
+	} else if (!is_expected) {
+		/* Inferring a race, since the value should not have changed. */
+		kcsan_counter_inc(kcsan_counter_races_unknown_origin);
+#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
+		kcsan_report(ptr, size, is_write, smp_processor_id(),
+			     kcsan_report_race_unknown_origin);
+#endif
+	}
+
+	kcsan_counter_dec(kcsan_counter_used_watchpoints);
+out_unlock:
+	local_irq_restore(irq_flags);
+out:
+	user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(__kcsan_setup_watchpoint);
diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
new file mode 100644
index 000000000000..6ddcbd185f3a
--- /dev/null
+++ b/kernel/kcsan/debugfs.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/atomic.h>
+#include <linux/bsearch.h>
+#include <linux/bug.h>
+#include <linux/debugfs.h>
+#include <linux/init.h>
+#include <linux/kallsyms.h>
+#include <linux/mm.h>
+#include <linux/seq_file.h>
+#include <linux/sort.h>
+#include <linux/string.h>
+#include <linux/uaccess.h>
+
+#include "kcsan.h"
+
+/*
+ * Statistics counters.
+ */
+static atomic_long_t counters[kcsan_counter_count];
+
+/*
+ * Addresses for filtering functions from reporting. This list can be used as a
+ * whitelist or blacklist.
+ */
+static struct {
+	unsigned long *addrs; /* array of addresses */
+	size_t size; /* current size */
+	int used; /* number of elements used */
+	bool sorted; /* if elements are sorted */
+	bool whitelist; /* if list is a blacklist or whitelist */
+} report_filterlist = {
+	.addrs = NULL,
+	.size = 8, /* small initial size */
+	.used = 0,
+	.sorted = false,
+	.whitelist = false, /* default is blacklist */
+};
+static DEFINE_SPINLOCK(report_filterlist_lock);
+
+static const char *counter_to_name(enum kcsan_counter_id id)
+{
+	switch (id) {
+	case kcsan_counter_used_watchpoints:
+		return "used_watchpoints";
+	case kcsan_counter_setup_watchpoints:
+		return "setup_watchpoints";
+	case kcsan_counter_data_races:
+		return "data_races";
+	case kcsan_counter_no_capacity:
+		return "no_capacity";
+	case kcsan_counter_report_races:
+		return "report_races";
+	case kcsan_counter_races_unknown_origin:
+		return "races_unknown_origin";
+	case kcsan_counter_unencodable_accesses:
+		return "unencodable_accesses";
+	case kcsan_counter_encoding_false_positives:
+		return "encoding_false_positives";
+	case kcsan_counter_count:
+		BUG();
+	}
+	return NULL;
+}
+
+void kcsan_counter_inc(enum kcsan_counter_id id)
+{
+	atomic_long_inc(&counters[id]);
+}
+
+void kcsan_counter_dec(enum kcsan_counter_id id)
+{
+	atomic_long_dec(&counters[id]);
+}
+
+static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
+{
+	const unsigned long a = *(const unsigned long *)rhs;
+	const unsigned long b = *(const unsigned long *)lhs;
+
+	return a < b ? -1 : a == b ? 0 : 1;
+}
+
+bool kcsan_skip_report(unsigned long func_addr)
+{
+	unsigned long symbolsize, offset;
+	unsigned long flags;
+	bool ret = false;
+
+	if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
+		return false;
+	func_addr -= offset; /* get function start */
+
+	spin_lock_irqsave(&report_filterlist_lock, flags);
+	if (report_filterlist.used == 0)
+		goto out;
+
+	/* Sort array if it is unsorted, and then do a binary search. */
+	if (!report_filterlist.sorted) {
+		sort(report_filterlist.addrs, report_filterlist.used,
+		     sizeof(unsigned long), cmp_filterlist_addrs, NULL);
+		report_filterlist.sorted = true;
+	}
+	ret = !!bsearch(&func_addr, report_filterlist.addrs,
+			report_filterlist.used, sizeof(unsigned long),
+			cmp_filterlist_addrs);
+	if (report_filterlist.whitelist)
+		ret = !ret;
+
+out:
+	spin_unlock_irqrestore(&report_filterlist_lock, flags);
+	return ret;
+}
+
+static void set_report_filterlist_whitelist(bool whitelist)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&report_filterlist_lock, flags);
+	report_filterlist.whitelist = whitelist;
+	spin_unlock_irqrestore(&report_filterlist_lock, flags);
+}
+
+static void insert_report_filterlist(const char *func)
+{
+	unsigned long flags;
+	unsigned long addr = kallsyms_lookup_name(func);
+
+	if (!addr) {
+		pr_err("KCSAN: could not find function: '%s'\n", func);
+		return;
+	}
+
+	spin_lock_irqsave(&report_filterlist_lock, flags);
+
+	if (report_filterlist.addrs == NULL)
+		report_filterlist.addrs = /* initial allocation */
+			kvmalloc_array(report_filterlist.size,
+				       sizeof(unsigned long), GFP_KERNEL);
+	else if (report_filterlist.used == report_filterlist.size) {
+		/* resize filterlist */
+		unsigned long *new_addrs;
+
+		report_filterlist.size *= 2;
+		new_addrs = kvmalloc_array(report_filterlist.size,
+					   sizeof(unsigned long), GFP_KERNEL);
+		memcpy(new_addrs, report_filterlist.addrs,
+		       report_filterlist.used * sizeof(unsigned long));
+		kvfree(report_filterlist.addrs);
+		report_filterlist.addrs = new_addrs;
+	}
+
+	/* Note: deduplicating should be done in userspace. */
+	report_filterlist.addrs[report_filterlist.used++] =
+		kallsyms_lookup_name(func);
+	report_filterlist.sorted = false;
+
+	spin_unlock_irqrestore(&report_filterlist_lock, flags);
+}
+
+static int show_info(struct seq_file *file, void *v)
+{
+	int i;
+	unsigned long flags;
+
+	/* show stats */
+	seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
+	for (i = 0; i < kcsan_counter_count; ++i)
+		seq_printf(file, "%s: %ld\n", counter_to_name(i),
+			   atomic_long_read(&counters[i]));
+
+	/* show filter functions, and filter type */
+	spin_lock_irqsave(&report_filterlist_lock, flags);
+	seq_printf(file, "\n%s functions: %s\n",
+		   report_filterlist.whitelist ? "whitelisted" : "blacklisted",
+		   report_filterlist.used == 0 ? "none" : "");
+	for (i = 0; i < report_filterlist.used; ++i)
+		seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
+	spin_unlock_irqrestore(&report_filterlist_lock, flags);
+
+	return 0;
+}
+
+static int debugfs_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, show_info, NULL);
+}
+
+static ssize_t debugfs_write(struct file *file, const char __user *buf,
+			     size_t count, loff_t *off)
+{
+	char kbuf[KSYM_NAME_LEN];
+	char *arg;
+	int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
+
+	if (copy_from_user(kbuf, buf, read_len))
+		return -EINVAL;
+	kbuf[read_len] = '\0';
+	arg = strstrip(kbuf);
+
+	if (!strncmp(arg, "on", sizeof("on") - 1))
+		WRITE_ONCE(kcsan_enabled, true);
+	else if (!strncmp(arg, "off", sizeof("off") - 1))
+		WRITE_ONCE(kcsan_enabled, false);
+	else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
+		set_report_filterlist_whitelist(true);
+	else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
+		set_report_filterlist_whitelist(false);
+	else if (arg[0] == '!')
+		insert_report_filterlist(&arg[1]);
+	else
+		return -EINVAL;
+
+	return count;
+}
+
+static const struct file_operations debugfs_ops = { .read = seq_read,
+						    .open = debugfs_open,
+						    .write = debugfs_write,
+						    .release = single_release };
+
+void __init kcsan_debugfs_init(void)
+{
+	debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
+}
diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
new file mode 100644
index 000000000000..8f9b1ce0e59f
--- /dev/null
+++ b/kernel/kcsan/encoding.h
@@ -0,0 +1,94 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _MM_KCSAN_ENCODING_H
+#define _MM_KCSAN_ENCODING_H
+
+#include <linux/bits.h>
+#include <linux/log2.h>
+#include <linux/mm.h>
+
+#include "kcsan.h"
+
+#define SLOT_RANGE PAGE_SIZE
+#define INVALID_WATCHPOINT 0
+#define CONSUMED_WATCHPOINT 1
+
+/*
+ * The maximum useful size of accesses for which we set up watchpoints is the
+ * max range of slots we check on an access.
+ */
+#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
+
+/*
+ * Number of bits we use to store size info.
+ */
+#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
+/*
+ * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
+ * however, most 64-bit architectures do not use the full 64-bit address space.
+ * Also, in order for a false positive to be observable 2 things need to happen:
+ *
+ *	1. different addresses but with the same encoded address race;
+ *	2. and both map onto the same watchpoint slots;
+ *
+ * Both these are assumed to be very unlikely. However, in case it still happens
+ * happens, the report logic will filter out the false positive (see report.c).
+ */
+#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
+
+/*
+ * Masks to set/retrieve the encoded data.
+ */
+#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
+#define WATCHPOINT_SIZE_MASK                                                   \
+	GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
+#define WATCHPOINT_ADDR_MASK                                                   \
+	GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
+
+static inline bool check_encodable(unsigned long addr, size_t size)
+{
+	return size <= MAX_ENCODABLE_SIZE;
+}
+
+static inline long encode_watchpoint(unsigned long addr, size_t size,
+				     bool is_write)
+{
+	return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
+		      (size << WATCHPOINT_ADDR_BITS) |
+		      (addr & WATCHPOINT_ADDR_MASK));
+}
+
+static inline bool decode_watchpoint(long watchpoint,
+				     unsigned long *addr_masked, size_t *size,
+				     bool *is_write)
+{
+	if (watchpoint == INVALID_WATCHPOINT ||
+	    watchpoint == CONSUMED_WATCHPOINT)
+		return false;
+
+	*addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
+	*size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
+		WATCHPOINT_ADDR_BITS;
+	*is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
+
+	return true;
+}
+
+/*
+ * Return watchpoint slot for an address.
+ */
+static inline int watchpoint_slot(unsigned long addr)
+{
+	return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
+}
+
+static inline bool matching_access(unsigned long addr1, size_t size1,
+				   unsigned long addr2, size_t size2)
+{
+	unsigned long end_range1 = addr1 + size1 - 1;
+	unsigned long end_range2 = addr2 + size2 - 1;
+
+	return addr1 <= end_range2 && addr2 <= end_range1;
+}
+
+#endif /* _MM_KCSAN_ENCODING_H */
diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
new file mode 100644
index 000000000000..45cf2fffd8a0
--- /dev/null
+++ b/kernel/kcsan/kcsan.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
+ * see Documentation/dev-tools/kcsan.rst.
+ */
+
+#include <linux/export.h>
+
+#include "kcsan.h"
+
+/*
+ * KCSAN uses the same instrumentation that is emitted by supported compilers
+ * for Thread Sanitizer (TSAN).
+ *
+ * When enabled, the compiler emits instrumentation calls (the functions
+ * prefixed with "__tsan" below) for all loads and stores that it generated;
+ * inline asm is not instrumented.
+ */
+
+#define DEFINE_TSAN_READ_WRITE(size)                                           \
+	void __tsan_read##size(void *ptr)                                      \
+	{                                                                      \
+		__kcsan_check_read(ptr, size);                                 \
+	}                                                                      \
+	EXPORT_SYMBOL(__tsan_read##size);                                      \
+	void __tsan_write##size(void *ptr)                                     \
+	{                                                                      \
+		__kcsan_check_write(ptr, size);                                \
+	}                                                                      \
+	EXPORT_SYMBOL(__tsan_write##size)
+
+DEFINE_TSAN_READ_WRITE(1);
+DEFINE_TSAN_READ_WRITE(2);
+DEFINE_TSAN_READ_WRITE(4);
+DEFINE_TSAN_READ_WRITE(8);
+DEFINE_TSAN_READ_WRITE(16);
+
+/*
+ * Not all supported compiler versions distinguish aligned/unaligned accesses,
+ * but e.g. recent versions of Clang do.
+ */
+#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
+	void __tsan_unaligned_read##size(void *ptr)                            \
+	{                                                                      \
+		__kcsan_check_read(ptr, size);                                 \
+	}                                                                      \
+	EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
+	void __tsan_unaligned_write##size(void *ptr)                           \
+	{                                                                      \
+		__kcsan_check_write(ptr, size);                                \
+	}                                                                      \
+	EXPORT_SYMBOL(__tsan_unaligned_write##size)
+
+DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
+DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
+DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
+DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
+
+void __tsan_read_range(void *ptr, size_t size)
+{
+	__kcsan_check_read(ptr, size);
+}
+EXPORT_SYMBOL(__tsan_read_range);
+
+void __tsan_write_range(void *ptr, size_t size)
+{
+	__kcsan_check_write(ptr, size);
+}
+EXPORT_SYMBOL(__tsan_write_range);
+
+/*
+ * The below are not required KCSAN, but can still be emitted by the compiler.
+ */
+void __tsan_func_entry(void *call_pc)
+{
+}
+EXPORT_SYMBOL(__tsan_func_entry);
+void __tsan_func_exit(void)
+{
+}
+EXPORT_SYMBOL(__tsan_func_exit);
+void __tsan_init(void)
+{
+}
+EXPORT_SYMBOL(__tsan_init);
diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
new file mode 100644
index 000000000000..429479b3041d
--- /dev/null
+++ b/kernel/kcsan/kcsan.h
@@ -0,0 +1,140 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _MM_KCSAN_KCSAN_H
+#define _MM_KCSAN_KCSAN_H
+
+#include <linux/kcsan.h>
+
+/*
+ * Total number of watchpoints. An address range maps into a specific slot as
+ * specified in `encoding.h`. Although larger number of watchpoints may not even
+ * be usable due to limited thread count, a larger value will improve
+ * performance due to reducing cache-line contention.
+ */
+#define KCSAN_NUM_WATCHPOINTS 64
+
+/*
+ * The number of adjacent watchpoints to check; the purpose is 2-fold:
+ *
+ *	1. the address slot is already occupied, check if any adjacent slots are
+ *	   free;
+ *	2. accesses that straddle a slot boundary due to size that exceeds a
+ *	   slot's range may check adjacent slots if any watchpoint matches.
+ *
+ * Note that accesses with very large size may still miss a watchpoint; however,
+ * given this should be rare, this is a reasonable trade-off to make, since this
+ * will avoid:
+ *
+ *	1. excessive contention between watchpoint checks and setup;
+ *	2. larger number of simultaneous watchpoints without sacrificing
+ *	   performance.
+ */
+#define KCSAN_CHECK_ADJACENT 1
+
+/*
+ * Globally enable and disable KCSAN.
+ */
+extern bool kcsan_enabled;
+
+/*
+ * Helper that returns true if access to ptr should be considered as an atomic
+ * access, even though it is not explicitly atomic.
+ */
+bool kcsan_is_atomic(const volatile void *ptr);
+
+/*
+ * Initialize debugfs file.
+ */
+void kcsan_debugfs_init(void);
+
+enum kcsan_counter_id {
+	/*
+	 * Number of watchpoints currently in use.
+	 */
+	kcsan_counter_used_watchpoints,
+
+	/*
+	 * Total number of watchpoints set up.
+	 */
+	kcsan_counter_setup_watchpoints,
+
+	/*
+	 * Total number of data-races.
+	 */
+	kcsan_counter_data_races,
+
+	/*
+	 * Number of times no watchpoints were available.
+	 */
+	kcsan_counter_no_capacity,
+
+	/*
+	 * A thread checking a watchpoint raced with another checking thread;
+	 * only one will be reported.
+	 */
+	kcsan_counter_report_races,
+
+	/*
+	 * Observed data value change, but writer thread unknown.
+	 */
+	kcsan_counter_races_unknown_origin,
+
+	/*
+	 * The access cannot be encoded to a valid watchpoint.
+	 */
+	kcsan_counter_unencodable_accesses,
+
+	/*
+	 * Watchpoint encoding caused a watchpoint to fire on mismatching
+	 * accesses.
+	 */
+	kcsan_counter_encoding_false_positives,
+
+	kcsan_counter_count, /* number of counters */
+};
+
+/*
+ * Increment/decrement counter with given id; avoid calling these in fast-path.
+ */
+void kcsan_counter_inc(enum kcsan_counter_id id);
+void kcsan_counter_dec(enum kcsan_counter_id id);
+
+/*
+ * Returns true if data-races in the function symbol that maps to addr (offsets
+ * are ignored) should *not* be reported.
+ */
+bool kcsan_skip_report(unsigned long func_addr);
+
+enum kcsan_report_type {
+	/*
+	 * The thread that set up the watchpoint and briefly stalled was
+	 * signalled that another thread triggered the watchpoint, and thus a
+	 * race was encountered.
+	 */
+	kcsan_report_race_setup,
+
+	/*
+	 * A thread encountered a watchpoint for the access, therefore a race
+	 * was encountered.
+	 */
+	kcsan_report_race_check,
+
+	/*
+	 * A thread encountered a watchpoint for the access, but the other
+	 * racing thread can no longer be signaled that a race occurred.
+	 */
+	kcsan_report_race_check_race,
+
+	/*
+	 * No other thread was observed to race with the access, but the data
+	 * value before and after the stall differs.
+	 */
+	kcsan_report_race_unknown_origin,
+};
+/*
+ * Print a race report from thread that encountered the race.
+ */
+void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
+		  int cpu_id, enum kcsan_report_type type);
+
+#endif /* _MM_KCSAN_KCSAN_H */
diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
new file mode 100644
index 000000000000..517db539e4e7
--- /dev/null
+++ b/kernel/kcsan/report.c
@@ -0,0 +1,306 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/kernel.h>
+#include <linux/preempt.h>
+#include <linux/printk.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/stacktrace.h>
+
+#include "kcsan.h"
+#include "encoding.h"
+
+/*
+ * Max. number of stack entries to show in the report.
+ */
+#define NUM_STACK_ENTRIES 16
+
+/*
+ * Other thread info: communicated from other racing thread to thread that set
+ * up the watchpoint, which then prints the complete report atomically. Only
+ * need one struct, as all threads should to be serialized regardless to print
+ * the reports, with reporting being in the slow-path.
+ */
+static struct {
+	const volatile void *ptr;
+	size_t size;
+	bool is_write;
+	int task_pid;
+	int cpu_id;
+	unsigned long stack_entries[NUM_STACK_ENTRIES];
+	int num_stack_entries;
+} other_info = { .ptr = NULL };
+
+static DEFINE_SPINLOCK(other_info_lock);
+static DEFINE_SPINLOCK(report_lock);
+
+static bool set_or_lock_other_info(unsigned long *flags,
+				   const volatile void *ptr, size_t size,
+				   bool is_write, int cpu_id,
+				   enum kcsan_report_type type)
+{
+	if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
+		return true;
+
+	for (;;) {
+		spin_lock_irqsave(&other_info_lock, *flags);
+
+		switch (type) {
+		case kcsan_report_race_check:
+			if (other_info.ptr != NULL) {
+				/* still in use, retry */
+				break;
+			}
+			other_info.ptr = ptr;
+			other_info.size = size;
+			other_info.is_write = is_write;
+			other_info.task_pid =
+				in_task() ? task_pid_nr(current) : -1;
+			other_info.cpu_id = cpu_id;
+			other_info.num_stack_entries = stack_trace_save(
+				other_info.stack_entries, NUM_STACK_ENTRIES, 1);
+			/*
+			 * other_info may now be consumed by thread we raced
+			 * with.
+			 */
+			spin_unlock_irqrestore(&other_info_lock, *flags);
+			return false;
+
+		case kcsan_report_race_setup:
+			if (other_info.ptr == NULL)
+				break; /* no data available yet, retry */
+
+			/*
+			 * First check if matching based on how watchpoint was
+			 * encoded.
+			 */
+			if (!matching_access((unsigned long)other_info.ptr &
+						     WATCHPOINT_ADDR_MASK,
+					     other_info.size,
+					     (unsigned long)ptr &
+						     WATCHPOINT_ADDR_MASK,
+					     size))
+				break; /* mismatching access, retry */
+
+			if (!matching_access((unsigned long)other_info.ptr,
+					     other_info.size,
+					     (unsigned long)ptr, size)) {
+				/*
+				 * If the actual accesses to not match, this was
+				 * a false positive due to watchpoint encoding.
+				 */
+				other_info.ptr = NULL; /* mark for reuse */
+				kcsan_counter_inc(
+					kcsan_counter_encoding_false_positives);
+				spin_unlock_irqrestore(&other_info_lock,
+						       *flags);
+				return false;
+			}
+
+			/*
+			 * Matching access: keep other_info locked, as this
+			 * thread uses it to print the full report; unlocked in
+			 * end_report.
+			 */
+			return true;
+
+		default:
+			BUG();
+		}
+
+		spin_unlock_irqrestore(&other_info_lock, *flags);
+	}
+}
+
+static void start_report(unsigned long *flags, enum kcsan_report_type type)
+{
+	switch (type) {
+	case kcsan_report_race_setup:
+		/* irqsaved already via other_info_lock */
+		spin_lock(&report_lock);
+		break;
+
+	case kcsan_report_race_unknown_origin:
+		spin_lock_irqsave(&report_lock, *flags);
+		break;
+
+	default:
+		BUG();
+	}
+}
+
+static void end_report(unsigned long *flags, enum kcsan_report_type type)
+{
+	switch (type) {
+	case kcsan_report_race_setup:
+		other_info.ptr = NULL; /* mark for reuse */
+		spin_unlock(&report_lock);
+		spin_unlock_irqrestore(&other_info_lock, *flags);
+		break;
+
+	case kcsan_report_race_unknown_origin:
+		spin_unlock_irqrestore(&report_lock, *flags);
+		break;
+
+	default:
+		BUG();
+	}
+}
+
+static const char *get_access_type(bool is_write)
+{
+	return is_write ? "write" : "read";
+}
+
+/* Return thread description: in task or interrupt. */
+static const char *get_thread_desc(int task_id)
+{
+	if (task_id != -1) {
+		static char buf[32]; /* safe: protected by report_lock */
+
+		snprintf(buf, sizeof(buf), "task %i", task_id);
+		return buf;
+	}
+	return in_nmi() ? "NMI" : "interrupt";
+}
+
+/* Helper to skip KCSAN-related functions in stack-trace. */
+static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
+{
+	char buf[64];
+	int skip = 0;
+
+	for (; skip < num_entries; ++skip) {
+		snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
+		if (!strnstr(buf, "csan_", sizeof(buf)) &&
+		    !strnstr(buf, "tsan_", sizeof(buf)) &&
+		    !strnstr(buf, "_once_size", sizeof(buf))) {
+			break;
+		}
+	}
+	return skip;
+}
+
+/* Compares symbolized strings of addr1 and addr2. */
+static int sym_strcmp(void *addr1, void *addr2)
+{
+	char buf1[64];
+	char buf2[64];
+
+	snprintf(buf1, sizeof(buf1), "%pS", addr1);
+	snprintf(buf2, sizeof(buf2), "%pS", addr2);
+	return strncmp(buf1, buf2, sizeof(buf1));
+}
+
+/*
+ * Returns true if a report was generated, false otherwise.
+ */
+static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
+			  int cpu_id, enum kcsan_report_type type)
+{
+	unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
+	int num_stack_entries =
+		stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
+	int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
+	int other_skipnr;
+
+	/* Check if the top stackframe is in a blacklisted function. */
+	if (kcsan_skip_report(stack_entries[skipnr]))
+		return false;
+	if (type == kcsan_report_race_setup) {
+		other_skipnr = get_stack_skipnr(other_info.stack_entries,
+						other_info.num_stack_entries);
+		if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
+			return false;
+	}
+
+	/* Print report header. */
+	pr_err("==================================================================\n");
+	switch (type) {
+	case kcsan_report_race_setup: {
+		void *this_fn = (void *)stack_entries[skipnr];
+		void *other_fn = (void *)other_info.stack_entries[other_skipnr];
+		int cmp;
+
+		/*
+		 * Order functions lexographically for consistent bug titles.
+		 * Do not print offset of functions to keep title short.
+		 */
+		cmp = sym_strcmp(other_fn, this_fn);
+		pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
+		       cmp < 0 ? other_fn : this_fn,
+		       cmp < 0 ? this_fn : other_fn);
+	} break;
+
+	case kcsan_report_race_unknown_origin:
+		pr_err("BUG: KCSAN: data-race in %pS\n",
+		       (void *)stack_entries[skipnr]);
+		break;
+
+	default:
+		BUG();
+	}
+
+	pr_err("\n");
+
+	/* Print information about the racing accesses. */
+	switch (type) {
+	case kcsan_report_race_setup:
+		pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
+		       get_access_type(other_info.is_write), other_info.ptr,
+		       other_info.size, get_thread_desc(other_info.task_pid),
+		       other_info.cpu_id);
+
+		/* Print the other thread's stack trace. */
+		stack_trace_print(other_info.stack_entries + other_skipnr,
+				  other_info.num_stack_entries - other_skipnr,
+				  0);
+
+		pr_err("\n");
+		pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
+		       get_access_type(is_write), ptr, size,
+		       get_thread_desc(in_task() ? task_pid_nr(current) : -1),
+		       cpu_id);
+		break;
+
+	case kcsan_report_race_unknown_origin:
+		pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
+		       get_access_type(is_write), ptr, size,
+		       get_thread_desc(in_task() ? task_pid_nr(current) : -1),
+		       cpu_id);
+		break;
+
+	default:
+		BUG();
+	}
+	/* Print stack trace of this thread. */
+	stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
+			  0);
+
+	/* Print report footer. */
+	pr_err("\n");
+	pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
+	dump_stack_print_info(KERN_DEFAULT);
+	pr_err("==================================================================\n");
+
+	return true;
+}
+
+void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
+		  int cpu_id, enum kcsan_report_type type)
+{
+	unsigned long flags = 0;
+
+	if (type == kcsan_report_race_check_race)
+		return;
+
+	kcsan_disable_current();
+	if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
+		start_report(&flags, type);
+		if (print_summary(ptr, size, is_write, cpu_id, type) &&
+		    panic_on_warn)
+			panic("panic_on_warn set ...\n");
+		end_report(&flags, type);
+	}
+	kcsan_enable_current();
+}
diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
new file mode 100644
index 000000000000..68c896a24529
--- /dev/null
+++ b/kernel/kcsan/test.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/printk.h>
+#include <linux/random.h>
+#include <linux/types.h>
+
+#include "encoding.h"
+
+#define ITERS_PER_TEST 2000
+
+/* Test requirements. */
+static bool test_requires(void)
+{
+	/* random should be initialized */
+	return prandom_u32() + prandom_u32() != 0;
+}
+
+/* Test watchpoint encode and decode. */
+static bool test_encode_decode(void)
+{
+	int i;
+
+	for (i = 0; i < ITERS_PER_TEST; ++i) {
+		size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
+		bool is_write = prandom_u32() % 2;
+		unsigned long addr;
+
+		prandom_bytes(&addr, sizeof(addr));
+		if (WARN_ON(!check_encodable(addr, size)))
+			return false;
+
+		/* encode and decode */
+		{
+			const long encoded_watchpoint =
+				encode_watchpoint(addr, size, is_write);
+			unsigned long verif_masked_addr;
+			size_t verif_size;
+			bool verif_is_write;
+
+			/* check special watchpoints */
+			if (WARN_ON(decode_watchpoint(
+				    INVALID_WATCHPOINT, &verif_masked_addr,
+				    &verif_size, &verif_is_write)))
+				return false;
+			if (WARN_ON(decode_watchpoint(
+				    CONSUMED_WATCHPOINT, &verif_masked_addr,
+				    &verif_size, &verif_is_write)))
+				return false;
+
+			/* check decoding watchpoint returns same data */
+			if (WARN_ON(!decode_watchpoint(
+				    encoded_watchpoint, &verif_masked_addr,
+				    &verif_size, &verif_is_write)))
+				return false;
+			if (WARN_ON(verif_masked_addr !=
+				    (addr & WATCHPOINT_ADDR_MASK)))
+				goto fail;
+			if (WARN_ON(verif_size != size))
+				goto fail;
+			if (WARN_ON(is_write != verif_is_write))
+				goto fail;
+
+			continue;
+fail:
+			pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
+			       __func__, is_write ? "write" : "read", size,
+			       addr, encoded_watchpoint,
+			       verif_is_write ? "write" : "read", verif_size,
+			       verif_masked_addr);
+			return false;
+		}
+	}
+
+	return true;
+}
+
+static bool test_matching_access(void)
+{
+	if (WARN_ON(!matching_access(10, 1, 10, 1)))
+		return false;
+	if (WARN_ON(!matching_access(10, 2, 11, 1)))
+		return false;
+	if (WARN_ON(!matching_access(10, 1, 9, 2)))
+		return false;
+	if (WARN_ON(matching_access(10, 1, 11, 1)))
+		return false;
+	if (WARN_ON(matching_access(9, 1, 10, 1)))
+		return false;
+	return true;
+}
+
+static int __init kcsan_selftest(void)
+{
+	int passed = 0;
+	int total = 0;
+
+#define RUN_TEST(do_test)                                                      \
+	do {                                                                   \
+		++total;                                                       \
+		if (do_test())                                                 \
+			++passed;                                              \
+		else                                                           \
+			pr_err("KCSAN selftest: " #do_test " failed");         \
+	} while (0)
+
+	RUN_TEST(test_requires);
+	RUN_TEST(test_encode_decode);
+	RUN_TEST(test_matching_access);
+
+	pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
+	if (passed != total)
+		panic("KCSAN selftests failed");
+	return 0;
+}
+postcore_initcall(kcsan_selftest);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 93d97f9b0157..35accd1d93de 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
 
 source "lib/Kconfig.ubsan"
 
+source "lib/Kconfig.kcsan"
+
 config ARCH_HAS_DEVMEM_IS_ALLOWED
 	bool
 
diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
new file mode 100644
index 000000000000..3e1f1acfb24b
--- /dev/null
+++ b/lib/Kconfig.kcsan
@@ -0,0 +1,88 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config HAVE_ARCH_KCSAN
+	bool
+
+menuconfig KCSAN
+	bool "KCSAN: watchpoint-based dynamic data-race detector"
+	depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
+	default n
+	help
+	  Kernel Concurrency Sanitizer is a dynamic data-race detector, which
+	  uses a watchpoint-based sampling approach to detect races.
+
+if KCSAN
+
+config KCSAN_SELFTEST
+	bool "KCSAN: perform short selftests on boot"
+	default y
+	help
+	  Run KCSAN selftests on boot. On test failure, causes kernel to panic.
+
+config KCSAN_EARLY_ENABLE
+	bool "KCSAN: early enable"
+	default y
+	help
+	  If KCSAN should be enabled globally as soon as possible. KCSAN can
+	  later be enabled/disabled via debugfs.
+
+config KCSAN_UDELAY_MAX_TASK
+	int "KCSAN: maximum delay in microseconds (for tasks)"
+	default 80
+	help
+	  For tasks, the max. microsecond delay after setting up a watchpoint.
+
+config KCSAN_UDELAY_MAX_INTERRUPT
+	int "KCSAN: maximum delay in microseconds (for interrupts)"
+	default 20
+	help
+	  For interrupts, the max. microsecond delay after setting up a watchpoint.
+
+config KCSAN_DELAY_RANDOMIZE
+	bool "KCSAN: randomize delays"
+	default y
+	help
+	  If delays should be randomized; if false, the chosen delay is simply
+	  the maximum values defined above.
+
+config KCSAN_WATCH_SKIP_INST
+	int "KCSAN: watchpoint instruction skip"
+	default 2000
+	help
+	  The number of per-CPU memory operations to skip watching, before
+	  another watchpoint is set up; in other words, 1 in
+	  KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
+	  watchpoint. A smaller value results in more aggressive race
+	  detection, whereas a larger value improves system performance at the
+	  cost of missing some races.
+
+config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
+	bool "KCSAN: report races of unknown origin"
+	default y
+	help
+	  If KCSAN should report races where only one access is known, and the
+	  conflicting access is of unknown origin. This type of race is
+	  reported if it was only possible to infer a race due to a data-value
+	  change while an access is being delayed on a watchpoint.
+
+config KCSAN_IGNORE_ATOMICS
+	bool "KCSAN: do not instrument marked atomic accesses"
+	default n
+	help
+	  If enabled, never instruments marked atomic accesses. This results in
+	  not reporting data-races where one access is atomic and the other is
+	  a plain access.
+
+config KCSAN_PLAIN_WRITE_PRETEND_ONCE
+	bool "KCSAN: pretend plain writes are WRITE_ONCE"
+	default n
+	help
+	  This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
+	  This option should only be used to prune initial data-races found in
+	  existing code.
+
+config KCSAN_DEBUG
+	bool "Debugging of KCSAN internals"
+	default n
+
+endif # KCSAN
diff --git a/lib/Makefile b/lib/Makefile
index c5892807e06f..778ab704e3ad 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
 CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
 endif
 
+# Used by KCSAN while enabled, avoid recursion.
+KCSAN_SANITIZE_random32.o := n
+
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 rbtree.o radix-tree.o timerqueue.o xarray.o \
 	 idr.o extable.o \
diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
new file mode 100644
index 000000000000..caf1111a28ae
--- /dev/null
+++ b/scripts/Makefile.kcsan
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+ifdef CONFIG_KCSAN
+
+CFLAGS_KCSAN := -fsanitize=thread
+
+endif # CONFIG_KCSAN
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 179d55af5852..0e78abab7d83 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
 	$(CFLAGS_KCOV))
 endif
 
+#
+# Enable ConcurrencySanitizer flags for kernel except some files or directories
+# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
+#
+ifeq ($(CONFIG_KCSAN),y)
+_c_flags += $(if $(patsubst n%,, \
+	$(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
+	$(CFLAGS_KCSAN))
+endif
+
 # $(srctree)/$(src) for including checkin headers from generated source files
 # $(objtree)/$(obj) for including generated headers from checkin source files
 ifeq ($(KBUILD_EXTMOD),)
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-17 14:12   ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:12 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
kernel space. KCSAN is a sampling watchpoint-based data-race detector.
See the included Documentation/dev-tools/kcsan.rst for more details.

This patch adds basic infrastructure, but does not yet enable KCSAN for
any architecture.

Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* Elaborate comment about instrumentation calls emitted by compilers.
* Replace kcsan_check_access(.., {true, false}) with
  kcsan_check_{read,write} for improved readability.
* Change bug title of race of unknown origin to just say "data-race in".
* Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
* Add comment about safety of find_watchpoint without user_access_save.
* Remove unnecessary preempt_disable/enable and elaborate on comment why
  we want to disable interrupts and preemptions.
* Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
  contexts [Suggested by Mark Rutland].
---
 Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
 MAINTAINERS                       |  11 +
 Makefile                          |   3 +-
 include/linux/compiler-clang.h    |   9 +
 include/linux/compiler-gcc.h      |   7 +
 include/linux/compiler.h          |  35 ++-
 include/linux/kcsan-checks.h      | 147 ++++++++++
 include/linux/kcsan.h             | 108 ++++++++
 include/linux/sched.h             |   4 +
 init/init_task.c                  |   8 +
 init/main.c                       |   2 +
 kernel/Makefile                   |   1 +
 kernel/kcsan/Makefile             |  14 +
 kernel/kcsan/atomic.c             |  21 ++
 kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
 kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
 kernel/kcsan/encoding.h           |  94 +++++++
 kernel/kcsan/kcsan.c              |  86 ++++++
 kernel/kcsan/kcsan.h              | 140 ++++++++++
 kernel/kcsan/report.c             | 306 +++++++++++++++++++++
 kernel/kcsan/test.c               | 117 ++++++++
 lib/Kconfig.debug                 |   2 +
 lib/Kconfig.kcsan                 |  88 ++++++
 lib/Makefile                      |   3 +
 scripts/Makefile.kcsan            |   6 +
 scripts/Makefile.lib              |  10 +
 26 files changed, 2069 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/dev-tools/kcsan.rst
 create mode 100644 include/linux/kcsan-checks.h
 create mode 100644 include/linux/kcsan.h
 create mode 100644 kernel/kcsan/Makefile
 create mode 100644 kernel/kcsan/atomic.c
 create mode 100644 kernel/kcsan/core.c
 create mode 100644 kernel/kcsan/debugfs.c
 create mode 100644 kernel/kcsan/encoding.h
 create mode 100644 kernel/kcsan/kcsan.c
 create mode 100644 kernel/kcsan/kcsan.h
 create mode 100644 kernel/kcsan/report.c
 create mode 100644 kernel/kcsan/test.c
 create mode 100644 lib/Kconfig.kcsan
 create mode 100644 scripts/Makefile.kcsan

diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
new file mode 100644
index 000000000000..497b09e5cc96
--- /dev/null
+++ b/Documentation/dev-tools/kcsan.rst
@@ -0,0 +1,203 @@
+The Kernel Concurrency Sanitizer (KCSAN)
+========================================
+
+Overview
+--------
+
+*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
+kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
+is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
+detector. Key priorities in KCSAN's design are lack of false positives,
+scalability, and simplicity. More details can be found in `Implementation
+Details`_.
+
+KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
+supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
+With Clang it requires version 7.0.0 or later.
+
+Usage
+-----
+
+To enable KCSAN configure kernel with::
+
+    CONFIG_KCSAN = y
+
+KCSAN provides several other configuration options to customize behaviour (see
+their respective help text for more info).
+
+debugfs
+~~~~~~~
+
+* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
+
+* KCSAN can be turned on or off by writing ``on`` or ``off`` to
+  ``/sys/kernel/debug/kcsan``.
+
+* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
+  ``some_func_name`` to the report filter list, which (by default) blacklists
+  reporting data-races where either one of the top stackframes are a function
+  in the list.
+
+* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
+  changes the report filtering behaviour. For example, the blacklist feature
+  can be used to silence frequently occurring data-races; the whitelist feature
+  can help with reproduction and testing of fixes.
+
+Error reports
+~~~~~~~~~~~~~
+
+A typical data-race report looks like this::
+
+    ==================================================================
+    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
+
+    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
+     kernfs_refresh_inode+0x70/0x170
+     kernfs_iop_permission+0x4f/0x90
+     inode_permission+0x190/0x200
+     link_path_walk.part.0+0x503/0x8e0
+     path_lookupat.isra.0+0x69/0x4d0
+     filename_lookup+0x136/0x280
+     user_path_at_empty+0x47/0x60
+     vfs_statx+0x9b/0x130
+     __do_sys_newlstat+0x50/0xb0
+     __x64_sys_newlstat+0x37/0x50
+     do_syscall_64+0x85/0x260
+     entry_SYSCALL_64_after_hwframe+0x44/0xa9
+
+    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
+     generic_permission+0x5b/0x2a0
+     kernfs_iop_permission+0x66/0x90
+     inode_permission+0x190/0x200
+     link_path_walk.part.0+0x503/0x8e0
+     path_lookupat.isra.0+0x69/0x4d0
+     filename_lookup+0x136/0x280
+     user_path_at_empty+0x47/0x60
+     do_faccessat+0x11a/0x390
+     __x64_sys_access+0x3c/0x50
+     do_syscall_64+0x85/0x260
+     entry_SYSCALL_64_after_hwframe+0x44/0xa9
+
+    Reported by Kernel Concurrency Sanitizer on:
+    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
+    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
+    ==================================================================
+
+The header of the report provides a short summary of the functions involved in
+the race. It is followed by the access types and stack traces of the 2 threads
+involved in the data-race.
+
+The other less common type of data-race report looks like this::
+
+    ==================================================================
+    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
+
+    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
+     e1000_clean_rx_irq+0x551/0xb10
+     e1000_clean+0x533/0xda0
+     net_rx_action+0x329/0x900
+     __do_softirq+0xdb/0x2db
+     irq_exit+0x9b/0xa0
+     do_IRQ+0x9c/0xf0
+     ret_from_intr+0x0/0x18
+     default_idle+0x3f/0x220
+     arch_cpu_idle+0x21/0x30
+     do_idle+0x1df/0x230
+     cpu_startup_entry+0x14/0x20
+     rest_init+0xc5/0xcb
+     arch_call_rest_init+0x13/0x2b
+     start_kernel+0x6db/0x700
+
+    Reported by Kernel Concurrency Sanitizer on:
+    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
+    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
+    ==================================================================
+
+This report is generated where it was not possible to determine the other
+racing thread, but a race was inferred due to the data-value of the watched
+memory location having changed. These can occur either due to missing
+instrumentation or e.g. DMA accesses.
+
+Data-Races
+----------
+
+Informally, two operations *conflict* if they access the same memory location,
+and at least one of them is a write operation. In an execution, two memory
+operations from different threads form a **data-race** if they *conflict*, at
+least one of them is a *plain access* (non-atomic), and they are *unordered* in
+the "happens-before" order according to the `LKMM
+<../../tools/memory-model/Documentation/explanation.txt>`_.
+
+Relationship with the Linux Kernel Memory Model (LKMM)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The LKMM defines the propagation and ordering rules of various memory
+operations, which gives developers the ability to reason about concurrent code.
+Ultimately this allows to determine the possible executions of concurrent code,
+and if that code is free from data-races.
+
+KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
+``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
+words, KCSAN assumes that as long as a plain access is not observed to race
+with another conflicting access, memory operations are correctly ordered.
+
+This means that KCSAN will not report *potential* data-races due to missing
+memory ordering. If, however, missing memory ordering (that is observable with
+a particular compiler and architecture) leads to an observable data-race (e.g.
+entering a critical section erroneously), KCSAN would report the resulting
+data-race.
+
+Implementation Details
+----------------------
+
+The general approach is inspired by `DataCollider
+<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
+Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
+relies on compiler instrumentation. Watchpoints are implemented using an
+efficient encoding that stores access type, size, and address in a long; the
+benefits of using "soft watchpoints" are portability and greater flexibility in
+limiting which accesses trigger a watchpoint.
+
+More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
+memory operations; for each instrumented plain access:
+
+1. Check if a matching watchpoint exists; if yes, and at least one access is a
+   write, then we encountered a racing access.
+
+2. Periodically, if no matching watchpoint exists, set up a watchpoint and
+   stall some delay.
+
+3. Also check the data value before the delay, and re-check the data value
+   after delay; if the values mismatch, we infer a race of unknown origin.
+
+To detect data-races between plain and atomic memory operations, KCSAN also
+annotates atomic accesses, but only to check if a watchpoint exists
+(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
+accesses.
+
+Key Properties
+~~~~~~~~~~~~~~
+
+1. **Memory Overhead:** No shadow memory is required. The current
+   implementation uses a small array of longs to encode watchpoint information,
+   which is negligible.
+
+2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
+   efficient watchpoint encoding that does not require acquiring any shared
+   locks in the fast-path. For kernel boot with a default config on a system
+   where nproc=8 we measure a slow-down of 10-15x.
+
+3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
+   may result in missed data-races (false negatives), compared to a
+   happens-before data-race detector.
+
+4. **Accuracy:** Imprecise, since it uses a sampling strategy.
+
+5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
+   runtime. With a happens-before data-race detector, any omission leads to
+   false positives, which is especially important in the context of the kernel
+   which includes numerous custom synchronization mechanisms. With KCSAN, as a
+   result, maintenance overheads are minimal as the kernel evolves.
+
+6. **Detects Racy Writes from Devices:** Due to checking data values upon
+   setting up watchpoints, racy writes from devices can also be detected.
diff --git a/MAINTAINERS b/MAINTAINERS
index 0154674cbad3..71f7fb625490 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8847,6 +8847,17 @@ F:	Documentation/kbuild/kconfig*
 F:	scripts/kconfig/
 F:	scripts/Kconfig.include
 
+KCSAN
+M:	Marco Elver <elver@google.com>
+R:	Dmitry Vyukov <dvyukov@google.com>
+L:	kasan-dev@googlegroups.com
+S:	Maintained
+F:	Documentation/dev-tools/kcsan.rst
+F:	include/linux/kcsan*.h
+F:	kernel/kcsan/
+F:	lib/Kconfig.kcsan
+F:	scripts/Makefile.kcsan
+
 KDUMP
 M:	Dave Young <dyoung@redhat.com>
 M:	Baoquan He <bhe@redhat.com>
diff --git a/Makefile b/Makefile
index ffd7a912fc46..ad4729176252 100644
--- a/Makefile
+++ b/Makefile
@@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
 
 export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
 export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
-export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
+export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
 export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
 export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
 export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
@@ -900,6 +900,7 @@ endif
 include scripts/Makefile.kasan
 include scripts/Makefile.extrawarn
 include scripts/Makefile.ubsan
+include scripts/Makefile.kcsan
 
 # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
 KBUILD_CPPFLAGS += $(KCPPFLAGS)
diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index 333a6695a918..a213eb55e725 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -24,6 +24,15 @@
 #define __no_sanitize_address
 #endif
 
+#if __has_feature(thread_sanitizer)
+/* emulate gcc's __SANITIZE_THREAD__ flag */
+#define __SANITIZE_THREAD__
+#define __no_sanitize_thread \
+		__attribute__((no_sanitize("thread")))
+#else
+#define __no_sanitize_thread
+#endif
+
 /*
  * Not all versions of clang implement the the type-generic versions
  * of the builtin overflow checkers. Fortunately, clang implements
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index d7ee4c6bad48..de105ca29282 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -145,6 +145,13 @@
 #define __no_sanitize_address
 #endif
 
+#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
+#define __no_sanitize_thread                                                   \
+	__attribute__((__noinline__)) __attribute__((no_sanitize_thread))
+#else
+#define __no_sanitize_thread
+#endif
+
 #if GCC_VERSION >= 50100
 #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
 #endif
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 5e88e7e33abe..350d80dbee4d 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
 #endif
 
 #include <uapi/linux/types.h>
+#include <linux/kcsan-checks.h>
 
 #define __READ_ONCE_SIZE						\
 ({									\
@@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
 	}								\
 })
 
-static __always_inline
-void __read_once_size(const volatile void *p, void *res, int size)
-{
-	__READ_ONCE_SIZE;
-}
-
 #ifdef CONFIG_KASAN
 /*
  * We can't declare function 'inline' because __no_sanitize_address confilcts
@@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
 # define __no_kasan_or_inline __always_inline
 #endif
 
-static __no_kasan_or_inline
+#ifdef CONFIG_KCSAN
+# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
+#else
+# define __no_kcsan_or_inline __always_inline
+#endif
+
+#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
+/* Avoid any instrumentation or inline. */
+#define __no_sanitize_or_inline                                                \
+	__no_sanitize_address __no_sanitize_thread notrace __maybe_unused
+#else
+#define __no_sanitize_or_inline __always_inline
+#endif
+
+static __no_kcsan_or_inline
+void __read_once_size(const volatile void *p, void *res, int size)
+{
+	kcsan_check_atomic_read((const void *)p, size);
+	__READ_ONCE_SIZE;
+}
+
+static __no_sanitize_or_inline
 void __read_once_size_nocheck(const volatile void *p, void *res, int size)
 {
 	__READ_ONCE_SIZE;
 }
 
-static __always_inline void __write_once_size(volatile void *p, void *res, int size)
+static __no_kcsan_or_inline
+void __write_once_size(volatile void *p, void *res, int size)
 {
+	kcsan_check_atomic_write((const void *)p, size);
+
 	switch (size) {
 	case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
 	case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
new file mode 100644
index 000000000000..4203603ae852
--- /dev/null
+++ b/include/linux/kcsan-checks.h
@@ -0,0 +1,147 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _LINUX_KCSAN_CHECKS_H
+#define _LINUX_KCSAN_CHECKS_H
+
+#include <linux/types.h>
+
+/*
+ * __kcsan_*: Always available when KCSAN is enabled. This may be used
+ * even in compilation units that selectively disable KCSAN, but must use KCSAN
+ * to validate access to an address.   Never use these in header files!
+ */
+#ifdef CONFIG_KCSAN
+/**
+ * __kcsan_check_watchpoint - check if a watchpoint exists
+ *
+ * Returns true if no race was detected, and we may then proceed to set up a
+ * watchpoint after. Returns false if either KCSAN is disabled or a race was
+ * encountered, and we may not set up a watchpoint after.
+ *
+ * @ptr address of access
+ * @size size of access
+ * @is_write is access a write
+ * @return true if no race was detected, false otherwise.
+ */
+bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
+			      bool is_write);
+
+/**
+ * __kcsan_setup_watchpoint - set up watchpoint and report data-races
+ *
+ * Sets up a watchpoint (if sampled), and if a racing access was observed,
+ * reports the data-race.
+ *
+ * @ptr address of access
+ * @size size of access
+ * @is_write is access a write
+ */
+void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
+			      bool is_write);
+#else
+static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
+					    size_t size, bool is_write)
+{
+	return true;
+}
+static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
+					    size_t size, bool is_write)
+{
+}
+#endif
+
+/*
+ * kcsan_*: Only available when the particular compilation unit has KCSAN
+ * instrumentation enabled. May be used in header files.
+ */
+#ifdef __SANITIZE_THREAD__
+#define kcsan_check_watchpoint __kcsan_check_watchpoint
+#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
+#else
+static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
+					  bool is_write)
+{
+	return true;
+}
+static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
+					  bool is_write)
+{
+}
+#endif
+
+/**
+ * __kcsan_check_read - check regular read access for data-races
+ *
+ * Full read access that checks watchpoint and sets up a watchpoint if this
+ * access is sampled. Note that, setting up watchpoints for plain reads is
+ * required to also detect data-races with atomic accesses.
+ *
+ * @ptr address of access
+ * @size size of access
+ */
+#define __kcsan_check_read(ptr, size)                                          \
+	do {                                                                   \
+		if (__kcsan_check_watchpoint(ptr, size, false))                \
+			__kcsan_setup_watchpoint(ptr, size, false);            \
+	} while (0)
+
+/**
+ * __kcsan_check_write - check regular write access for data-races
+ *
+ * Full write access that checks watchpoint and sets up a watchpoint if this
+ * access is sampled.
+ *
+ * @ptr address of access
+ * @size size of access
+ */
+#define __kcsan_check_write(ptr, size)                                         \
+	do {                                                                   \
+		if (__kcsan_check_watchpoint(ptr, size, true) &&               \
+		    !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
+			__kcsan_setup_watchpoint(ptr, size, true);             \
+	} while (0)
+
+/**
+ * kcsan_check_read - check regular read access for data-races
+ *
+ * @ptr address of access
+ * @size size of access
+ */
+#define kcsan_check_read(ptr, size)                                            \
+	do {                                                                   \
+		if (kcsan_check_watchpoint(ptr, size, false))                  \
+			kcsan_setup_watchpoint(ptr, size, false);              \
+	} while (0)
+
+/**
+ * kcsan_check_write - check regular write access for data-races
+ *
+ * @ptr address of access
+ * @size size of access
+ */
+#define kcsan_check_write(ptr, size)                                           \
+	do {                                                                   \
+		if (kcsan_check_watchpoint(ptr, size, true) &&                 \
+		    !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
+			kcsan_setup_watchpoint(ptr, size, true);               \
+	} while (0)
+
+/*
+ * Check for atomic accesses: if atomic access are not ignored, this simply
+ * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
+ */
+#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
+#define kcsan_check_atomic_read(...)                                           \
+	do {                                                                   \
+	} while (0)
+#define kcsan_check_atomic_write(...)                                          \
+	do {                                                                   \
+	} while (0)
+#else
+#define kcsan_check_atomic_read(ptr, size)                                     \
+	kcsan_check_watchpoint(ptr, size, false)
+#define kcsan_check_atomic_write(ptr, size)                                    \
+	kcsan_check_watchpoint(ptr, size, true)
+#endif
+
+#endif /* _LINUX_KCSAN_CHECKS_H */
diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
new file mode 100644
index 000000000000..fd5de2ba3a16
--- /dev/null
+++ b/include/linux/kcsan.h
@@ -0,0 +1,108 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _LINUX_KCSAN_H
+#define _LINUX_KCSAN_H
+
+#include <linux/types.h>
+#include <linux/kcsan-checks.h>
+
+#ifdef CONFIG_KCSAN
+
+/*
+ * Context for each thread of execution: for tasks, this is stored in
+ * task_struct, and interrupts access internal per-CPU storage.
+ */
+struct kcsan_ctx {
+	int disable; /* disable counter */
+	int atomic_next; /* number of following atomic ops */
+
+	/*
+	 * We use separate variables to store if we are in a nestable or flat
+	 * atomic region. This helps make sure that an atomic region with
+	 * nesting support is not suddenly aborted when a flat region is
+	 * contained within. Effectively this allows supporting nesting flat
+	 * atomic regions within an outer nestable atomic region. Support for
+	 * this is required as there are cases where a seqlock reader critical
+	 * section (flat atomic region) is contained within a seqlock writer
+	 * critical section (nestable atomic region), and the "mismatching
+	 * kcsan_end_atomic()" warning would trigger otherwise.
+	 */
+	int atomic_region;
+	bool atomic_region_flat;
+};
+
+/**
+ * kcsan_init - initialize KCSAN runtime
+ */
+void kcsan_init(void);
+
+/**
+ * kcsan_disable_current - disable KCSAN for the current context
+ *
+ * Supports nesting.
+ */
+void kcsan_disable_current(void);
+
+/**
+ * kcsan_enable_current - re-enable KCSAN for the current context
+ *
+ * Supports nesting.
+ */
+void kcsan_enable_current(void);
+
+/**
+ * kcsan_begin_atomic - use to denote an atomic region
+ *
+ * Accesses within the atomic region may appear to race with other accesses but
+ * should be considered atomic.
+ *
+ * @nest true if regions may be nested, or false for flat region
+ */
+void kcsan_begin_atomic(bool nest);
+
+/**
+ * kcsan_end_atomic - end atomic region
+ *
+ * @nest must match argument to kcsan_begin_atomic().
+ */
+void kcsan_end_atomic(bool nest);
+
+/**
+ * kcsan_atomic_next - consider following accesses as atomic
+ *
+ * Force treating the next n memory accesses for the current context as atomic
+ * operations.
+ *
+ * @n number of following memory accesses to treat as atomic.
+ */
+void kcsan_atomic_next(int n);
+
+#else /* CONFIG_KCSAN */
+
+static inline void kcsan_init(void)
+{
+}
+
+static inline void kcsan_disable_current(void)
+{
+}
+
+static inline void kcsan_enable_current(void)
+{
+}
+
+static inline void kcsan_begin_atomic(bool nest)
+{
+}
+
+static inline void kcsan_end_atomic(bool nest)
+{
+}
+
+static inline void kcsan_atomic_next(int n)
+{
+}
+
+#endif /* CONFIG_KCSAN */
+
+#endif /* _LINUX_KCSAN_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2c2e56bd8913..9490e417bf4a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -31,6 +31,7 @@
 #include <linux/task_io_accounting.h>
 #include <linux/posix-timers.h>
 #include <linux/rseq.h>
+#include <linux/kcsan.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
 struct audit_context;
@@ -1171,6 +1172,9 @@ struct task_struct {
 #ifdef CONFIG_KASAN
 	unsigned int			kasan_depth;
 #endif
+#ifdef CONFIG_KCSAN
+	struct kcsan_ctx		kcsan_ctx;
+#endif
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	/* Index of current stored address in ret_stack: */
diff --git a/init/init_task.c b/init/init_task.c
index 9e5cbe5eab7b..e229416c3314 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -161,6 +161,14 @@ struct task_struct init_task
 #ifdef CONFIG_KASAN
 	.kasan_depth	= 1,
 #endif
+#ifdef CONFIG_KCSAN
+	.kcsan_ctx = {
+		.disable		= 1,
+		.atomic_next		= 0,
+		.atomic_region		= 0,
+		.atomic_region_flat	= 0,
+	},
+#endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	.softirqs_enabled = 1,
 #endif
diff --git a/init/main.c b/init/main.c
index 91f6ebb30ef0..4d814de017ee 100644
--- a/init/main.c
+++ b/init/main.c
@@ -93,6 +93,7 @@
 #include <linux/rodata_test.h>
 #include <linux/jump_label.h>
 #include <linux/mem_encrypt.h>
+#include <linux/kcsan.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
 	acpi_subsystem_init();
 	arch_post_acpi_subsys_init();
 	sfi_init_late();
+	kcsan_init();
 
 	/* Do the rest non-__init'ed, we're now alive */
 	arch_call_rest_init();
diff --git a/kernel/Makefile b/kernel/Makefile
index daad787fb795..74ab46e2ebd1 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
 obj-$(CONFIG_IRQ_WORK) += irq_work.o
 obj-$(CONFIG_CPU_PM) += cpu_pm.o
 obj-$(CONFIG_BPF) += bpf/
+obj-$(CONFIG_KCSAN) += kcsan/
 
 obj-$(CONFIG_PERF_EVENTS) += events/
 
diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
new file mode 100644
index 000000000000..c25f07062d26
--- /dev/null
+++ b/kernel/kcsan/Makefile
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: GPL-2.0
+KCSAN_SANITIZE := n
+KCOV_INSTRUMENT := n
+
+CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
+
+CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
+CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
+CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
+
+obj-y := kcsan.o core.o atomic.o debugfs.o report.o
+obj-$(CONFIG_KCSAN_SELFTEST) += test.o
diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
new file mode 100644
index 000000000000..dd44f7d9e491
--- /dev/null
+++ b/kernel/kcsan/atomic.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/jiffies.h>
+
+#include "kcsan.h"
+
+/*
+ * List all volatile globals that have been observed in races, to suppress
+ * data-race reports between accesses to these variables.
+ *
+ * For now, we assume that volatile accesses of globals are as strong as atomic
+ * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
+ * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
+ * than cast to volatile. Eventually, we hope to be able to remove this
+ * function.
+ */
+bool kcsan_is_atomic(const volatile void *ptr)
+{
+	/* only jiffies for now */
+	return ptr == &jiffies;
+}
diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
new file mode 100644
index 000000000000..bc8d60b129eb
--- /dev/null
+++ b/kernel/kcsan/core.c
@@ -0,0 +1,428 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/atomic.h>
+#include <linux/bug.h>
+#include <linux/delay.h>
+#include <linux/export.h>
+#include <linux/init.h>
+#include <linux/percpu.h>
+#include <linux/preempt.h>
+#include <linux/random.h>
+#include <linux/sched.h>
+#include <linux/uaccess.h>
+
+#include "kcsan.h"
+#include "encoding.h"
+
+/*
+ * Helper macros to iterate slots, starting from address slot itself, followed
+ * by the right and left slots.
+ */
+#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
+#define SLOT_IDX(slot, i)                                                      \
+	((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
+		  KCSAN_CHECK_ADJACENT)) %                                     \
+	 KCSAN_NUM_WATCHPOINTS)
+
+bool kcsan_enabled;
+
+/* Per-CPU kcsan_ctx for interrupts */
+static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
+	.disable = 0,
+	.atomic_next = 0,
+	.atomic_region = 0,
+	.atomic_region_flat = 0,
+};
+
+/*
+ * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
+ * able to safely update and access a watchpoint without introducing locking
+ * overhead, we encode each watchpoint as a single atomic long. The initial
+ * zero-initialized state matches INVALID_WATCHPOINT.
+ */
+static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
+
+/*
+ * Instructions skipped counter; see should_watch().
+ */
+static DEFINE_PER_CPU(unsigned long, kcsan_skip);
+
+static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
+					     bool expect_write,
+					     long *encoded_watchpoint)
+{
+	const int slot = watchpoint_slot(addr);
+	const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
+	atomic_long_t *watchpoint;
+	unsigned long wp_addr_masked;
+	size_t wp_size;
+	bool is_write;
+	int i;
+
+	for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
+		watchpoint = &watchpoints[SLOT_IDX(slot, i)];
+		*encoded_watchpoint = atomic_long_read(watchpoint);
+		if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
+				       &wp_size, &is_write))
+			continue;
+
+		if (expect_write && !is_write)
+			continue;
+
+		/* Check if the watchpoint matches the access. */
+		if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
+			return watchpoint;
+	}
+
+	return NULL;
+}
+
+static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
+					       bool is_write)
+{
+	const int slot = watchpoint_slot(addr);
+	const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
+	atomic_long_t *watchpoint;
+	int i;
+
+	for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
+		long expect_val = INVALID_WATCHPOINT;
+
+		/* Try to acquire this slot. */
+		watchpoint = &watchpoints[SLOT_IDX(slot, i)];
+		if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
+						    encoded_watchpoint))
+			return watchpoint;
+	}
+
+	return NULL;
+}
+
+/*
+ * Return true if watchpoint was successfully consumed, false otherwise.
+ *
+ * This may return false if:
+ *
+ *	1. another thread already consumed the watchpoint;
+ *	2. the thread that set up the watchpoint already removed it;
+ *	3. the watchpoint was removed and then re-used.
+ */
+static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
+					  long encoded_watchpoint)
+{
+	return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
+					       CONSUMED_WATCHPOINT);
+}
+
+/*
+ * Return true if watchpoint was not touched, false if consumed.
+ */
+static inline bool remove_watchpoint(atomic_long_t *watchpoint)
+{
+	return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
+	       CONSUMED_WATCHPOINT;
+}
+
+static inline struct kcsan_ctx *get_ctx(void)
+{
+	/*
+	 * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
+	 * also result in calls that generate warnings in uaccess regions.
+	 */
+	return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
+}
+
+static inline bool is_atomic(const volatile void *ptr)
+{
+	struct kcsan_ctx *ctx = get_ctx();
+
+	if (unlikely(ctx->atomic_next > 0)) {
+		--ctx->atomic_next;
+		return true;
+	}
+	if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
+		return true;
+
+	return kcsan_is_atomic(ptr);
+}
+
+static inline bool should_watch(const volatile void *ptr)
+{
+	/*
+	 * Never set up watchpoints when memory operations are atomic.
+	 *
+	 * We need to check this first, because: 1) atomics should not count
+	 * towards skipped instructions below, and 2) to actually decrement
+	 * kcsan_atomic_next for each atomic.
+	 */
+	if (is_atomic(ptr))
+		return false;
+
+	/*
+	 * We use a per-CPU counter, to avoid excessive contention; there is
+	 * still enough non-determinism for the precise instructions that end up
+	 * being watched to be mostly unpredictable. Using a PRNG like
+	 * prandom_u32() turned out to be too slow.
+	 */
+	return (this_cpu_inc_return(kcsan_skip) %
+		CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
+}
+
+static inline bool is_enabled(void)
+{
+	return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
+}
+
+static inline unsigned int get_delay(void)
+{
+	unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
+					     CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
+	return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
+		       ((prandom_u32() % max_delay) + 1) :
+		       max_delay;
+}
+
+/* === Public interface ===================================================== */
+
+void __init kcsan_init(void)
+{
+	BUG_ON(!in_task());
+
+	kcsan_debugfs_init();
+	kcsan_enable_current();
+#ifdef CONFIG_KCSAN_EARLY_ENABLE
+	/*
+	 * We are in the init task, and no other tasks should be running.
+	 */
+	WRITE_ONCE(kcsan_enabled, true);
+#endif
+}
+
+/* === Exported interface =================================================== */
+
+void kcsan_disable_current(void)
+{
+	++get_ctx()->disable;
+}
+EXPORT_SYMBOL(kcsan_disable_current);
+
+void kcsan_enable_current(void)
+{
+	if (get_ctx()->disable-- == 0) {
+		kcsan_disable_current(); /* restore to 0 */
+		kcsan_disable_current();
+		WARN(1, "mismatching %s", __func__);
+		kcsan_enable_current();
+	}
+}
+EXPORT_SYMBOL(kcsan_enable_current);
+
+void kcsan_begin_atomic(bool nest)
+{
+	if (nest)
+		++get_ctx()->atomic_region;
+	else
+		get_ctx()->atomic_region_flat = true;
+}
+EXPORT_SYMBOL(kcsan_begin_atomic);
+
+void kcsan_end_atomic(bool nest)
+{
+	if (nest) {
+		if (get_ctx()->atomic_region-- == 0) {
+			kcsan_begin_atomic(true); /* restore to 0 */
+			kcsan_disable_current();
+			WARN(1, "mismatching %s", __func__);
+			kcsan_enable_current();
+		}
+	} else {
+		get_ctx()->atomic_region_flat = false;
+	}
+}
+EXPORT_SYMBOL(kcsan_end_atomic);
+
+void kcsan_atomic_next(int n)
+{
+	get_ctx()->atomic_next = n;
+}
+EXPORT_SYMBOL(kcsan_atomic_next);
+
+bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
+			      bool is_write)
+{
+	atomic_long_t *watchpoint;
+	long encoded_watchpoint;
+	unsigned long flags;
+	enum kcsan_report_type report_type;
+
+	if (unlikely(!is_enabled()))
+		return false;
+
+	/*
+	 * Avoid user_access_save in fast-path here: find_watchpoint is safe
+	 * without user_access_save, as the address that ptr points to is only
+	 * used to check if a watchpoint exists; ptr is never dereferenced.
+	 */
+	watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
+				     &encoded_watchpoint);
+	if (watchpoint == NULL)
+		return true;
+
+	flags = user_access_save();
+	if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
+		/*
+		 * The other thread may not print any diagnostics, as it has
+		 * already removed the watchpoint, or another thread consumed
+		 * the watchpoint before this thread.
+		 */
+		kcsan_counter_inc(kcsan_counter_report_races);
+		report_type = kcsan_report_race_check_race;
+	} else {
+		report_type = kcsan_report_race_check;
+	}
+
+	/* Encountered a data-race. */
+	kcsan_counter_inc(kcsan_counter_data_races);
+	kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
+
+	user_access_restore(flags);
+	return false;
+}
+EXPORT_SYMBOL(__kcsan_check_watchpoint);
+
+void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
+			      bool is_write)
+{
+	atomic_long_t *watchpoint;
+	union {
+		u8 _1;
+		u16 _2;
+		u32 _4;
+		u64 _8;
+	} expect_value;
+	bool is_expected = true;
+	unsigned long ua_flags = user_access_save();
+	unsigned long irq_flags;
+
+	if (!should_watch(ptr))
+		goto out;
+
+	if (!check_encodable((unsigned long)ptr, size)) {
+		kcsan_counter_inc(kcsan_counter_unencodable_accesses);
+		goto out;
+	}
+
+	/*
+	 * Disable interrupts & preemptions to avoid another thread on the same
+	 * CPU accessing memory locations for the set up watchpoint; this is to
+	 * avoid reporting races to e.g. CPU-local data.
+	 *
+	 * An alternative would be adding the source CPU to the watchpoint
+	 * encoding, and checking that watchpoint-CPU != this-CPU. There are
+	 * several problems with this:
+	 *   1. we should avoid stealing more bits from the watchpoint encoding
+	 *      as it would affect accuracy, as well as increase performance
+	 *      overhead in the fast-path;
+	 *   2. if we are preempted, but there *is* a genuine data-race, we
+	 *      would *not* report it -- since this is the common case (vs.
+	 *      CPU-local data accesses), it makes more sense (from a data-race
+	 *      detection PoV) to simply disable preemptions to ensure as many
+	 *      tasks as possible run on other CPUs.
+	 */
+	local_irq_save(irq_flags);
+
+	watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
+	if (watchpoint == NULL) {
+		/*
+		 * Out of capacity: the size of `watchpoints`, and the frequency
+		 * with which `should_watch()` returns true should be tweaked so
+		 * that this case happens very rarely.
+		 */
+		kcsan_counter_inc(kcsan_counter_no_capacity);
+		goto out_unlock;
+	}
+
+	kcsan_counter_inc(kcsan_counter_setup_watchpoints);
+	kcsan_counter_inc(kcsan_counter_used_watchpoints);
+
+	/*
+	 * Read the current value, to later check and infer a race if the data
+	 * was modified via a non-instrumented access, e.g. from a device.
+	 */
+	switch (size) {
+	case 1:
+		expect_value._1 = READ_ONCE(*(const u8 *)ptr);
+		break;
+	case 2:
+		expect_value._2 = READ_ONCE(*(const u16 *)ptr);
+		break;
+	case 4:
+		expect_value._4 = READ_ONCE(*(const u32 *)ptr);
+		break;
+	case 8:
+		expect_value._8 = READ_ONCE(*(const u64 *)ptr);
+		break;
+	default:
+		break; /* ignore; we do not diff the values */
+	}
+
+#ifdef CONFIG_KCSAN_DEBUG
+	kcsan_disable_current();
+	pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
+	       is_write ? "write" : "read", size, ptr,
+	       watchpoint_slot((unsigned long)ptr),
+	       encode_watchpoint((unsigned long)ptr, size, is_write));
+	kcsan_enable_current();
+#endif
+
+	/*
+	 * Delay this thread, to increase probability of observing a racy
+	 * conflicting access.
+	 */
+	udelay(get_delay());
+
+	/*
+	 * Re-read value, and check if it is as expected; if not, we infer a
+	 * racy access.
+	 */
+	switch (size) {
+	case 1:
+		is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
+		break;
+	case 2:
+		is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
+		break;
+	case 4:
+		is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
+		break;
+	case 8:
+		is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
+		break;
+	default:
+		break; /* ignore; we do not diff the values */
+	}
+
+	/* Check if this access raced with another. */
+	if (!remove_watchpoint(watchpoint)) {
+		/*
+		 * No need to increment 'race' counter, as the racing thread
+		 * already did.
+		 */
+		kcsan_report(ptr, size, is_write, smp_processor_id(),
+			     kcsan_report_race_setup);
+	} else if (!is_expected) {
+		/* Inferring a race, since the value should not have changed. */
+		kcsan_counter_inc(kcsan_counter_races_unknown_origin);
+#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
+		kcsan_report(ptr, size, is_write, smp_processor_id(),
+			     kcsan_report_race_unknown_origin);
+#endif
+	}
+
+	kcsan_counter_dec(kcsan_counter_used_watchpoints);
+out_unlock:
+	local_irq_restore(irq_flags);
+out:
+	user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(__kcsan_setup_watchpoint);
diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
new file mode 100644
index 000000000000..6ddcbd185f3a
--- /dev/null
+++ b/kernel/kcsan/debugfs.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/atomic.h>
+#include <linux/bsearch.h>
+#include <linux/bug.h>
+#include <linux/debugfs.h>
+#include <linux/init.h>
+#include <linux/kallsyms.h>
+#include <linux/mm.h>
+#include <linux/seq_file.h>
+#include <linux/sort.h>
+#include <linux/string.h>
+#include <linux/uaccess.h>
+
+#include "kcsan.h"
+
+/*
+ * Statistics counters.
+ */
+static atomic_long_t counters[kcsan_counter_count];
+
+/*
+ * Addresses for filtering functions from reporting. This list can be used as a
+ * whitelist or blacklist.
+ */
+static struct {
+	unsigned long *addrs; /* array of addresses */
+	size_t size; /* current size */
+	int used; /* number of elements used */
+	bool sorted; /* if elements are sorted */
+	bool whitelist; /* if list is a blacklist or whitelist */
+} report_filterlist = {
+	.addrs = NULL,
+	.size = 8, /* small initial size */
+	.used = 0,
+	.sorted = false,
+	.whitelist = false, /* default is blacklist */
+};
+static DEFINE_SPINLOCK(report_filterlist_lock);
+
+static const char *counter_to_name(enum kcsan_counter_id id)
+{
+	switch (id) {
+	case kcsan_counter_used_watchpoints:
+		return "used_watchpoints";
+	case kcsan_counter_setup_watchpoints:
+		return "setup_watchpoints";
+	case kcsan_counter_data_races:
+		return "data_races";
+	case kcsan_counter_no_capacity:
+		return "no_capacity";
+	case kcsan_counter_report_races:
+		return "report_races";
+	case kcsan_counter_races_unknown_origin:
+		return "races_unknown_origin";
+	case kcsan_counter_unencodable_accesses:
+		return "unencodable_accesses";
+	case kcsan_counter_encoding_false_positives:
+		return "encoding_false_positives";
+	case kcsan_counter_count:
+		BUG();
+	}
+	return NULL;
+}
+
+void kcsan_counter_inc(enum kcsan_counter_id id)
+{
+	atomic_long_inc(&counters[id]);
+}
+
+void kcsan_counter_dec(enum kcsan_counter_id id)
+{
+	atomic_long_dec(&counters[id]);
+}
+
+static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
+{
+	const unsigned long a = *(const unsigned long *)rhs;
+	const unsigned long b = *(const unsigned long *)lhs;
+
+	return a < b ? -1 : a == b ? 0 : 1;
+}
+
+bool kcsan_skip_report(unsigned long func_addr)
+{
+	unsigned long symbolsize, offset;
+	unsigned long flags;
+	bool ret = false;
+
+	if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
+		return false;
+	func_addr -= offset; /* get function start */
+
+	spin_lock_irqsave(&report_filterlist_lock, flags);
+	if (report_filterlist.used == 0)
+		goto out;
+
+	/* Sort array if it is unsorted, and then do a binary search. */
+	if (!report_filterlist.sorted) {
+		sort(report_filterlist.addrs, report_filterlist.used,
+		     sizeof(unsigned long), cmp_filterlist_addrs, NULL);
+		report_filterlist.sorted = true;
+	}
+	ret = !!bsearch(&func_addr, report_filterlist.addrs,
+			report_filterlist.used, sizeof(unsigned long),
+			cmp_filterlist_addrs);
+	if (report_filterlist.whitelist)
+		ret = !ret;
+
+out:
+	spin_unlock_irqrestore(&report_filterlist_lock, flags);
+	return ret;
+}
+
+static void set_report_filterlist_whitelist(bool whitelist)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&report_filterlist_lock, flags);
+	report_filterlist.whitelist = whitelist;
+	spin_unlock_irqrestore(&report_filterlist_lock, flags);
+}
+
+static void insert_report_filterlist(const char *func)
+{
+	unsigned long flags;
+	unsigned long addr = kallsyms_lookup_name(func);
+
+	if (!addr) {
+		pr_err("KCSAN: could not find function: '%s'\n", func);
+		return;
+	}
+
+	spin_lock_irqsave(&report_filterlist_lock, flags);
+
+	if (report_filterlist.addrs == NULL)
+		report_filterlist.addrs = /* initial allocation */
+			kvmalloc_array(report_filterlist.size,
+				       sizeof(unsigned long), GFP_KERNEL);
+	else if (report_filterlist.used == report_filterlist.size) {
+		/* resize filterlist */
+		unsigned long *new_addrs;
+
+		report_filterlist.size *= 2;
+		new_addrs = kvmalloc_array(report_filterlist.size,
+					   sizeof(unsigned long), GFP_KERNEL);
+		memcpy(new_addrs, report_filterlist.addrs,
+		       report_filterlist.used * sizeof(unsigned long));
+		kvfree(report_filterlist.addrs);
+		report_filterlist.addrs = new_addrs;
+	}
+
+	/* Note: deduplicating should be done in userspace. */
+	report_filterlist.addrs[report_filterlist.used++] =
+		kallsyms_lookup_name(func);
+	report_filterlist.sorted = false;
+
+	spin_unlock_irqrestore(&report_filterlist_lock, flags);
+}
+
+static int show_info(struct seq_file *file, void *v)
+{
+	int i;
+	unsigned long flags;
+
+	/* show stats */
+	seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
+	for (i = 0; i < kcsan_counter_count; ++i)
+		seq_printf(file, "%s: %ld\n", counter_to_name(i),
+			   atomic_long_read(&counters[i]));
+
+	/* show filter functions, and filter type */
+	spin_lock_irqsave(&report_filterlist_lock, flags);
+	seq_printf(file, "\n%s functions: %s\n",
+		   report_filterlist.whitelist ? "whitelisted" : "blacklisted",
+		   report_filterlist.used == 0 ? "none" : "");
+	for (i = 0; i < report_filterlist.used; ++i)
+		seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
+	spin_unlock_irqrestore(&report_filterlist_lock, flags);
+
+	return 0;
+}
+
+static int debugfs_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, show_info, NULL);
+}
+
+static ssize_t debugfs_write(struct file *file, const char __user *buf,
+			     size_t count, loff_t *off)
+{
+	char kbuf[KSYM_NAME_LEN];
+	char *arg;
+	int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
+
+	if (copy_from_user(kbuf, buf, read_len))
+		return -EINVAL;
+	kbuf[read_len] = '\0';
+	arg = strstrip(kbuf);
+
+	if (!strncmp(arg, "on", sizeof("on") - 1))
+		WRITE_ONCE(kcsan_enabled, true);
+	else if (!strncmp(arg, "off", sizeof("off") - 1))
+		WRITE_ONCE(kcsan_enabled, false);
+	else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
+		set_report_filterlist_whitelist(true);
+	else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
+		set_report_filterlist_whitelist(false);
+	else if (arg[0] == '!')
+		insert_report_filterlist(&arg[1]);
+	else
+		return -EINVAL;
+
+	return count;
+}
+
+static const struct file_operations debugfs_ops = { .read = seq_read,
+						    .open = debugfs_open,
+						    .write = debugfs_write,
+						    .release = single_release };
+
+void __init kcsan_debugfs_init(void)
+{
+	debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
+}
diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
new file mode 100644
index 000000000000..8f9b1ce0e59f
--- /dev/null
+++ b/kernel/kcsan/encoding.h
@@ -0,0 +1,94 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _MM_KCSAN_ENCODING_H
+#define _MM_KCSAN_ENCODING_H
+
+#include <linux/bits.h>
+#include <linux/log2.h>
+#include <linux/mm.h>
+
+#include "kcsan.h"
+
+#define SLOT_RANGE PAGE_SIZE
+#define INVALID_WATCHPOINT 0
+#define CONSUMED_WATCHPOINT 1
+
+/*
+ * The maximum useful size of accesses for which we set up watchpoints is the
+ * max range of slots we check on an access.
+ */
+#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
+
+/*
+ * Number of bits we use to store size info.
+ */
+#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
+/*
+ * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
+ * however, most 64-bit architectures do not use the full 64-bit address space.
+ * Also, in order for a false positive to be observable 2 things need to happen:
+ *
+ *	1. different addresses but with the same encoded address race;
+ *	2. and both map onto the same watchpoint slots;
+ *
+ * Both these are assumed to be very unlikely. However, in case it still happens
+ * happens, the report logic will filter out the false positive (see report.c).
+ */
+#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
+
+/*
+ * Masks to set/retrieve the encoded data.
+ */
+#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
+#define WATCHPOINT_SIZE_MASK                                                   \
+	GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
+#define WATCHPOINT_ADDR_MASK                                                   \
+	GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
+
+static inline bool check_encodable(unsigned long addr, size_t size)
+{
+	return size <= MAX_ENCODABLE_SIZE;
+}
+
+static inline long encode_watchpoint(unsigned long addr, size_t size,
+				     bool is_write)
+{
+	return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
+		      (size << WATCHPOINT_ADDR_BITS) |
+		      (addr & WATCHPOINT_ADDR_MASK));
+}
+
+static inline bool decode_watchpoint(long watchpoint,
+				     unsigned long *addr_masked, size_t *size,
+				     bool *is_write)
+{
+	if (watchpoint == INVALID_WATCHPOINT ||
+	    watchpoint == CONSUMED_WATCHPOINT)
+		return false;
+
+	*addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
+	*size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
+		WATCHPOINT_ADDR_BITS;
+	*is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
+
+	return true;
+}
+
+/*
+ * Return watchpoint slot for an address.
+ */
+static inline int watchpoint_slot(unsigned long addr)
+{
+	return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
+}
+
+static inline bool matching_access(unsigned long addr1, size_t size1,
+				   unsigned long addr2, size_t size2)
+{
+	unsigned long end_range1 = addr1 + size1 - 1;
+	unsigned long end_range2 = addr2 + size2 - 1;
+
+	return addr1 <= end_range2 && addr2 <= end_range1;
+}
+
+#endif /* _MM_KCSAN_ENCODING_H */
diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
new file mode 100644
index 000000000000..45cf2fffd8a0
--- /dev/null
+++ b/kernel/kcsan/kcsan.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
+ * see Documentation/dev-tools/kcsan.rst.
+ */
+
+#include <linux/export.h>
+
+#include "kcsan.h"
+
+/*
+ * KCSAN uses the same instrumentation that is emitted by supported compilers
+ * for Thread Sanitizer (TSAN).
+ *
+ * When enabled, the compiler emits instrumentation calls (the functions
+ * prefixed with "__tsan" below) for all loads and stores that it generated;
+ * inline asm is not instrumented.
+ */
+
+#define DEFINE_TSAN_READ_WRITE(size)                                           \
+	void __tsan_read##size(void *ptr)                                      \
+	{                                                                      \
+		__kcsan_check_read(ptr, size);                                 \
+	}                                                                      \
+	EXPORT_SYMBOL(__tsan_read##size);                                      \
+	void __tsan_write##size(void *ptr)                                     \
+	{                                                                      \
+		__kcsan_check_write(ptr, size);                                \
+	}                                                                      \
+	EXPORT_SYMBOL(__tsan_write##size)
+
+DEFINE_TSAN_READ_WRITE(1);
+DEFINE_TSAN_READ_WRITE(2);
+DEFINE_TSAN_READ_WRITE(4);
+DEFINE_TSAN_READ_WRITE(8);
+DEFINE_TSAN_READ_WRITE(16);
+
+/*
+ * Not all supported compiler versions distinguish aligned/unaligned accesses,
+ * but e.g. recent versions of Clang do.
+ */
+#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
+	void __tsan_unaligned_read##size(void *ptr)                            \
+	{                                                                      \
+		__kcsan_check_read(ptr, size);                                 \
+	}                                                                      \
+	EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
+	void __tsan_unaligned_write##size(void *ptr)                           \
+	{                                                                      \
+		__kcsan_check_write(ptr, size);                                \
+	}                                                                      \
+	EXPORT_SYMBOL(__tsan_unaligned_write##size)
+
+DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
+DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
+DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
+DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
+
+void __tsan_read_range(void *ptr, size_t size)
+{
+	__kcsan_check_read(ptr, size);
+}
+EXPORT_SYMBOL(__tsan_read_range);
+
+void __tsan_write_range(void *ptr, size_t size)
+{
+	__kcsan_check_write(ptr, size);
+}
+EXPORT_SYMBOL(__tsan_write_range);
+
+/*
+ * The below are not required KCSAN, but can still be emitted by the compiler.
+ */
+void __tsan_func_entry(void *call_pc)
+{
+}
+EXPORT_SYMBOL(__tsan_func_entry);
+void __tsan_func_exit(void)
+{
+}
+EXPORT_SYMBOL(__tsan_func_exit);
+void __tsan_init(void)
+{
+}
+EXPORT_SYMBOL(__tsan_init);
diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
new file mode 100644
index 000000000000..429479b3041d
--- /dev/null
+++ b/kernel/kcsan/kcsan.h
@@ -0,0 +1,140 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _MM_KCSAN_KCSAN_H
+#define _MM_KCSAN_KCSAN_H
+
+#include <linux/kcsan.h>
+
+/*
+ * Total number of watchpoints. An address range maps into a specific slot as
+ * specified in `encoding.h`. Although larger number of watchpoints may not even
+ * be usable due to limited thread count, a larger value will improve
+ * performance due to reducing cache-line contention.
+ */
+#define KCSAN_NUM_WATCHPOINTS 64
+
+/*
+ * The number of adjacent watchpoints to check; the purpose is 2-fold:
+ *
+ *	1. the address slot is already occupied, check if any adjacent slots are
+ *	   free;
+ *	2. accesses that straddle a slot boundary due to size that exceeds a
+ *	   slot's range may check adjacent slots if any watchpoint matches.
+ *
+ * Note that accesses with very large size may still miss a watchpoint; however,
+ * given this should be rare, this is a reasonable trade-off to make, since this
+ * will avoid:
+ *
+ *	1. excessive contention between watchpoint checks and setup;
+ *	2. larger number of simultaneous watchpoints without sacrificing
+ *	   performance.
+ */
+#define KCSAN_CHECK_ADJACENT 1
+
+/*
+ * Globally enable and disable KCSAN.
+ */
+extern bool kcsan_enabled;
+
+/*
+ * Helper that returns true if access to ptr should be considered as an atomic
+ * access, even though it is not explicitly atomic.
+ */
+bool kcsan_is_atomic(const volatile void *ptr);
+
+/*
+ * Initialize debugfs file.
+ */
+void kcsan_debugfs_init(void);
+
+enum kcsan_counter_id {
+	/*
+	 * Number of watchpoints currently in use.
+	 */
+	kcsan_counter_used_watchpoints,
+
+	/*
+	 * Total number of watchpoints set up.
+	 */
+	kcsan_counter_setup_watchpoints,
+
+	/*
+	 * Total number of data-races.
+	 */
+	kcsan_counter_data_races,
+
+	/*
+	 * Number of times no watchpoints were available.
+	 */
+	kcsan_counter_no_capacity,
+
+	/*
+	 * A thread checking a watchpoint raced with another checking thread;
+	 * only one will be reported.
+	 */
+	kcsan_counter_report_races,
+
+	/*
+	 * Observed data value change, but writer thread unknown.
+	 */
+	kcsan_counter_races_unknown_origin,
+
+	/*
+	 * The access cannot be encoded to a valid watchpoint.
+	 */
+	kcsan_counter_unencodable_accesses,
+
+	/*
+	 * Watchpoint encoding caused a watchpoint to fire on mismatching
+	 * accesses.
+	 */
+	kcsan_counter_encoding_false_positives,
+
+	kcsan_counter_count, /* number of counters */
+};
+
+/*
+ * Increment/decrement counter with given id; avoid calling these in fast-path.
+ */
+void kcsan_counter_inc(enum kcsan_counter_id id);
+void kcsan_counter_dec(enum kcsan_counter_id id);
+
+/*
+ * Returns true if data-races in the function symbol that maps to addr (offsets
+ * are ignored) should *not* be reported.
+ */
+bool kcsan_skip_report(unsigned long func_addr);
+
+enum kcsan_report_type {
+	/*
+	 * The thread that set up the watchpoint and briefly stalled was
+	 * signalled that another thread triggered the watchpoint, and thus a
+	 * race was encountered.
+	 */
+	kcsan_report_race_setup,
+
+	/*
+	 * A thread encountered a watchpoint for the access, therefore a race
+	 * was encountered.
+	 */
+	kcsan_report_race_check,
+
+	/*
+	 * A thread encountered a watchpoint for the access, but the other
+	 * racing thread can no longer be signaled that a race occurred.
+	 */
+	kcsan_report_race_check_race,
+
+	/*
+	 * No other thread was observed to race with the access, but the data
+	 * value before and after the stall differs.
+	 */
+	kcsan_report_race_unknown_origin,
+};
+/*
+ * Print a race report from thread that encountered the race.
+ */
+void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
+		  int cpu_id, enum kcsan_report_type type);
+
+#endif /* _MM_KCSAN_KCSAN_H */
diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
new file mode 100644
index 000000000000..517db539e4e7
--- /dev/null
+++ b/kernel/kcsan/report.c
@@ -0,0 +1,306 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/kernel.h>
+#include <linux/preempt.h>
+#include <linux/printk.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/stacktrace.h>
+
+#include "kcsan.h"
+#include "encoding.h"
+
+/*
+ * Max. number of stack entries to show in the report.
+ */
+#define NUM_STACK_ENTRIES 16
+
+/*
+ * Other thread info: communicated from other racing thread to thread that set
+ * up the watchpoint, which then prints the complete report atomically. Only
+ * need one struct, as all threads should to be serialized regardless to print
+ * the reports, with reporting being in the slow-path.
+ */
+static struct {
+	const volatile void *ptr;
+	size_t size;
+	bool is_write;
+	int task_pid;
+	int cpu_id;
+	unsigned long stack_entries[NUM_STACK_ENTRIES];
+	int num_stack_entries;
+} other_info = { .ptr = NULL };
+
+static DEFINE_SPINLOCK(other_info_lock);
+static DEFINE_SPINLOCK(report_lock);
+
+static bool set_or_lock_other_info(unsigned long *flags,
+				   const volatile void *ptr, size_t size,
+				   bool is_write, int cpu_id,
+				   enum kcsan_report_type type)
+{
+	if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
+		return true;
+
+	for (;;) {
+		spin_lock_irqsave(&other_info_lock, *flags);
+
+		switch (type) {
+		case kcsan_report_race_check:
+			if (other_info.ptr != NULL) {
+				/* still in use, retry */
+				break;
+			}
+			other_info.ptr = ptr;
+			other_info.size = size;
+			other_info.is_write = is_write;
+			other_info.task_pid =
+				in_task() ? task_pid_nr(current) : -1;
+			other_info.cpu_id = cpu_id;
+			other_info.num_stack_entries = stack_trace_save(
+				other_info.stack_entries, NUM_STACK_ENTRIES, 1);
+			/*
+			 * other_info may now be consumed by thread we raced
+			 * with.
+			 */
+			spin_unlock_irqrestore(&other_info_lock, *flags);
+			return false;
+
+		case kcsan_report_race_setup:
+			if (other_info.ptr == NULL)
+				break; /* no data available yet, retry */
+
+			/*
+			 * First check if matching based on how watchpoint was
+			 * encoded.
+			 */
+			if (!matching_access((unsigned long)other_info.ptr &
+						     WATCHPOINT_ADDR_MASK,
+					     other_info.size,
+					     (unsigned long)ptr &
+						     WATCHPOINT_ADDR_MASK,
+					     size))
+				break; /* mismatching access, retry */
+
+			if (!matching_access((unsigned long)other_info.ptr,
+					     other_info.size,
+					     (unsigned long)ptr, size)) {
+				/*
+				 * If the actual accesses to not match, this was
+				 * a false positive due to watchpoint encoding.
+				 */
+				other_info.ptr = NULL; /* mark for reuse */
+				kcsan_counter_inc(
+					kcsan_counter_encoding_false_positives);
+				spin_unlock_irqrestore(&other_info_lock,
+						       *flags);
+				return false;
+			}
+
+			/*
+			 * Matching access: keep other_info locked, as this
+			 * thread uses it to print the full report; unlocked in
+			 * end_report.
+			 */
+			return true;
+
+		default:
+			BUG();
+		}
+
+		spin_unlock_irqrestore(&other_info_lock, *flags);
+	}
+}
+
+static void start_report(unsigned long *flags, enum kcsan_report_type type)
+{
+	switch (type) {
+	case kcsan_report_race_setup:
+		/* irqsaved already via other_info_lock */
+		spin_lock(&report_lock);
+		break;
+
+	case kcsan_report_race_unknown_origin:
+		spin_lock_irqsave(&report_lock, *flags);
+		break;
+
+	default:
+		BUG();
+	}
+}
+
+static void end_report(unsigned long *flags, enum kcsan_report_type type)
+{
+	switch (type) {
+	case kcsan_report_race_setup:
+		other_info.ptr = NULL; /* mark for reuse */
+		spin_unlock(&report_lock);
+		spin_unlock_irqrestore(&other_info_lock, *flags);
+		break;
+
+	case kcsan_report_race_unknown_origin:
+		spin_unlock_irqrestore(&report_lock, *flags);
+		break;
+
+	default:
+		BUG();
+	}
+}
+
+static const char *get_access_type(bool is_write)
+{
+	return is_write ? "write" : "read";
+}
+
+/* Return thread description: in task or interrupt. */
+static const char *get_thread_desc(int task_id)
+{
+	if (task_id != -1) {
+		static char buf[32]; /* safe: protected by report_lock */
+
+		snprintf(buf, sizeof(buf), "task %i", task_id);
+		return buf;
+	}
+	return in_nmi() ? "NMI" : "interrupt";
+}
+
+/* Helper to skip KCSAN-related functions in stack-trace. */
+static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
+{
+	char buf[64];
+	int skip = 0;
+
+	for (; skip < num_entries; ++skip) {
+		snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
+		if (!strnstr(buf, "csan_", sizeof(buf)) &&
+		    !strnstr(buf, "tsan_", sizeof(buf)) &&
+		    !strnstr(buf, "_once_size", sizeof(buf))) {
+			break;
+		}
+	}
+	return skip;
+}
+
+/* Compares symbolized strings of addr1 and addr2. */
+static int sym_strcmp(void *addr1, void *addr2)
+{
+	char buf1[64];
+	char buf2[64];
+
+	snprintf(buf1, sizeof(buf1), "%pS", addr1);
+	snprintf(buf2, sizeof(buf2), "%pS", addr2);
+	return strncmp(buf1, buf2, sizeof(buf1));
+}
+
+/*
+ * Returns true if a report was generated, false otherwise.
+ */
+static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
+			  int cpu_id, enum kcsan_report_type type)
+{
+	unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
+	int num_stack_entries =
+		stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
+	int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
+	int other_skipnr;
+
+	/* Check if the top stackframe is in a blacklisted function. */
+	if (kcsan_skip_report(stack_entries[skipnr]))
+		return false;
+	if (type == kcsan_report_race_setup) {
+		other_skipnr = get_stack_skipnr(other_info.stack_entries,
+						other_info.num_stack_entries);
+		if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
+			return false;
+	}
+
+	/* Print report header. */
+	pr_err("==================================================================\n");
+	switch (type) {
+	case kcsan_report_race_setup: {
+		void *this_fn = (void *)stack_entries[skipnr];
+		void *other_fn = (void *)other_info.stack_entries[other_skipnr];
+		int cmp;
+
+		/*
+		 * Order functions lexographically for consistent bug titles.
+		 * Do not print offset of functions to keep title short.
+		 */
+		cmp = sym_strcmp(other_fn, this_fn);
+		pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
+		       cmp < 0 ? other_fn : this_fn,
+		       cmp < 0 ? this_fn : other_fn);
+	} break;
+
+	case kcsan_report_race_unknown_origin:
+		pr_err("BUG: KCSAN: data-race in %pS\n",
+		       (void *)stack_entries[skipnr]);
+		break;
+
+	default:
+		BUG();
+	}
+
+	pr_err("\n");
+
+	/* Print information about the racing accesses. */
+	switch (type) {
+	case kcsan_report_race_setup:
+		pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
+		       get_access_type(other_info.is_write), other_info.ptr,
+		       other_info.size, get_thread_desc(other_info.task_pid),
+		       other_info.cpu_id);
+
+		/* Print the other thread's stack trace. */
+		stack_trace_print(other_info.stack_entries + other_skipnr,
+				  other_info.num_stack_entries - other_skipnr,
+				  0);
+
+		pr_err("\n");
+		pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
+		       get_access_type(is_write), ptr, size,
+		       get_thread_desc(in_task() ? task_pid_nr(current) : -1),
+		       cpu_id);
+		break;
+
+	case kcsan_report_race_unknown_origin:
+		pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
+		       get_access_type(is_write), ptr, size,
+		       get_thread_desc(in_task() ? task_pid_nr(current) : -1),
+		       cpu_id);
+		break;
+
+	default:
+		BUG();
+	}
+	/* Print stack trace of this thread. */
+	stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
+			  0);
+
+	/* Print report footer. */
+	pr_err("\n");
+	pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
+	dump_stack_print_info(KERN_DEFAULT);
+	pr_err("==================================================================\n");
+
+	return true;
+}
+
+void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
+		  int cpu_id, enum kcsan_report_type type)
+{
+	unsigned long flags = 0;
+
+	if (type == kcsan_report_race_check_race)
+		return;
+
+	kcsan_disable_current();
+	if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
+		start_report(&flags, type);
+		if (print_summary(ptr, size, is_write, cpu_id, type) &&
+		    panic_on_warn)
+			panic("panic_on_warn set ...\n");
+		end_report(&flags, type);
+	}
+	kcsan_enable_current();
+}
diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
new file mode 100644
index 000000000000..68c896a24529
--- /dev/null
+++ b/kernel/kcsan/test.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/printk.h>
+#include <linux/random.h>
+#include <linux/types.h>
+
+#include "encoding.h"
+
+#define ITERS_PER_TEST 2000
+
+/* Test requirements. */
+static bool test_requires(void)
+{
+	/* random should be initialized */
+	return prandom_u32() + prandom_u32() != 0;
+}
+
+/* Test watchpoint encode and decode. */
+static bool test_encode_decode(void)
+{
+	int i;
+
+	for (i = 0; i < ITERS_PER_TEST; ++i) {
+		size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
+		bool is_write = prandom_u32() % 2;
+		unsigned long addr;
+
+		prandom_bytes(&addr, sizeof(addr));
+		if (WARN_ON(!check_encodable(addr, size)))
+			return false;
+
+		/* encode and decode */
+		{
+			const long encoded_watchpoint =
+				encode_watchpoint(addr, size, is_write);
+			unsigned long verif_masked_addr;
+			size_t verif_size;
+			bool verif_is_write;
+
+			/* check special watchpoints */
+			if (WARN_ON(decode_watchpoint(
+				    INVALID_WATCHPOINT, &verif_masked_addr,
+				    &verif_size, &verif_is_write)))
+				return false;
+			if (WARN_ON(decode_watchpoint(
+				    CONSUMED_WATCHPOINT, &verif_masked_addr,
+				    &verif_size, &verif_is_write)))
+				return false;
+
+			/* check decoding watchpoint returns same data */
+			if (WARN_ON(!decode_watchpoint(
+				    encoded_watchpoint, &verif_masked_addr,
+				    &verif_size, &verif_is_write)))
+				return false;
+			if (WARN_ON(verif_masked_addr !=
+				    (addr & WATCHPOINT_ADDR_MASK)))
+				goto fail;
+			if (WARN_ON(verif_size != size))
+				goto fail;
+			if (WARN_ON(is_write != verif_is_write))
+				goto fail;
+
+			continue;
+fail:
+			pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
+			       __func__, is_write ? "write" : "read", size,
+			       addr, encoded_watchpoint,
+			       verif_is_write ? "write" : "read", verif_size,
+			       verif_masked_addr);
+			return false;
+		}
+	}
+
+	return true;
+}
+
+static bool test_matching_access(void)
+{
+	if (WARN_ON(!matching_access(10, 1, 10, 1)))
+		return false;
+	if (WARN_ON(!matching_access(10, 2, 11, 1)))
+		return false;
+	if (WARN_ON(!matching_access(10, 1, 9, 2)))
+		return false;
+	if (WARN_ON(matching_access(10, 1, 11, 1)))
+		return false;
+	if (WARN_ON(matching_access(9, 1, 10, 1)))
+		return false;
+	return true;
+}
+
+static int __init kcsan_selftest(void)
+{
+	int passed = 0;
+	int total = 0;
+
+#define RUN_TEST(do_test)                                                      \
+	do {                                                                   \
+		++total;                                                       \
+		if (do_test())                                                 \
+			++passed;                                              \
+		else                                                           \
+			pr_err("KCSAN selftest: " #do_test " failed");         \
+	} while (0)
+
+	RUN_TEST(test_requires);
+	RUN_TEST(test_encode_decode);
+	RUN_TEST(test_matching_access);
+
+	pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
+	if (passed != total)
+		panic("KCSAN selftests failed");
+	return 0;
+}
+postcore_initcall(kcsan_selftest);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 93d97f9b0157..35accd1d93de 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
 
 source "lib/Kconfig.ubsan"
 
+source "lib/Kconfig.kcsan"
+
 config ARCH_HAS_DEVMEM_IS_ALLOWED
 	bool
 
diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
new file mode 100644
index 000000000000..3e1f1acfb24b
--- /dev/null
+++ b/lib/Kconfig.kcsan
@@ -0,0 +1,88 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config HAVE_ARCH_KCSAN
+	bool
+
+menuconfig KCSAN
+	bool "KCSAN: watchpoint-based dynamic data-race detector"
+	depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
+	default n
+	help
+	  Kernel Concurrency Sanitizer is a dynamic data-race detector, which
+	  uses a watchpoint-based sampling approach to detect races.
+
+if KCSAN
+
+config KCSAN_SELFTEST
+	bool "KCSAN: perform short selftests on boot"
+	default y
+	help
+	  Run KCSAN selftests on boot. On test failure, causes kernel to panic.
+
+config KCSAN_EARLY_ENABLE
+	bool "KCSAN: early enable"
+	default y
+	help
+	  If KCSAN should be enabled globally as soon as possible. KCSAN can
+	  later be enabled/disabled via debugfs.
+
+config KCSAN_UDELAY_MAX_TASK
+	int "KCSAN: maximum delay in microseconds (for tasks)"
+	default 80
+	help
+	  For tasks, the max. microsecond delay after setting up a watchpoint.
+
+config KCSAN_UDELAY_MAX_INTERRUPT
+	int "KCSAN: maximum delay in microseconds (for interrupts)"
+	default 20
+	help
+	  For interrupts, the max. microsecond delay after setting up a watchpoint.
+
+config KCSAN_DELAY_RANDOMIZE
+	bool "KCSAN: randomize delays"
+	default y
+	help
+	  If delays should be randomized; if false, the chosen delay is simply
+	  the maximum values defined above.
+
+config KCSAN_WATCH_SKIP_INST
+	int "KCSAN: watchpoint instruction skip"
+	default 2000
+	help
+	  The number of per-CPU memory operations to skip watching, before
+	  another watchpoint is set up; in other words, 1 in
+	  KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
+	  watchpoint. A smaller value results in more aggressive race
+	  detection, whereas a larger value improves system performance at the
+	  cost of missing some races.
+
+config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
+	bool "KCSAN: report races of unknown origin"
+	default y
+	help
+	  If KCSAN should report races where only one access is known, and the
+	  conflicting access is of unknown origin. This type of race is
+	  reported if it was only possible to infer a race due to a data-value
+	  change while an access is being delayed on a watchpoint.
+
+config KCSAN_IGNORE_ATOMICS
+	bool "KCSAN: do not instrument marked atomic accesses"
+	default n
+	help
+	  If enabled, never instruments marked atomic accesses. This results in
+	  not reporting data-races where one access is atomic and the other is
+	  a plain access.
+
+config KCSAN_PLAIN_WRITE_PRETEND_ONCE
+	bool "KCSAN: pretend plain writes are WRITE_ONCE"
+	default n
+	help
+	  This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
+	  This option should only be used to prune initial data-races found in
+	  existing code.
+
+config KCSAN_DEBUG
+	bool "Debugging of KCSAN internals"
+	default n
+
+endif # KCSAN
diff --git a/lib/Makefile b/lib/Makefile
index c5892807e06f..778ab704e3ad 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
 CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
 endif
 
+# Used by KCSAN while enabled, avoid recursion.
+KCSAN_SANITIZE_random32.o := n
+
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 rbtree.o radix-tree.o timerqueue.o xarray.o \
 	 idr.o extable.o \
diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
new file mode 100644
index 000000000000..caf1111a28ae
--- /dev/null
+++ b/scripts/Makefile.kcsan
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+ifdef CONFIG_KCSAN
+
+CFLAGS_KCSAN := -fsanitize=thread
+
+endif # CONFIG_KCSAN
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 179d55af5852..0e78abab7d83 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
 	$(CFLAGS_KCOV))
 endif
 
+#
+# Enable ConcurrencySanitizer flags for kernel except some files or directories
+# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
+#
+ifeq ($(CONFIG_KCSAN),y)
+_c_flags += $(if $(patsubst n%,, \
+	$(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
+	$(CFLAGS_KCSAN))
+endif
+
 # $(srctree)/$(src) for including checkin headers from generated source files
 # $(objtree)/$(obj) for including generated headers from checkin source files
 ifeq ($(KBUILD_EXTMOD),)
-- 
2.23.0.866.gb869b98d4c-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 2/8] objtool, kcsan: Add KCSAN runtime functions to whitelist
  2019-10-17 14:12 ` Marco Elver
@ 2019-10-17 14:12   ` Marco Elver
  -1 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:12 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

This patch adds KCSAN runtime functions to the objtool whitelist.

Signed-off-by: Marco Elver <elver@google.com>
---
 tools/objtool/check.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 044c9a3cb247..d1acc867b43c 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -466,6 +466,23 @@ static const char *uaccess_safe_builtin[] = {
 	"__asan_report_store4_noabort",
 	"__asan_report_store8_noabort",
 	"__asan_report_store16_noabort",
+	/* KCSAN */
+	"__kcsan_check_watchpoint",
+	"__kcsan_setup_watchpoint",
+	/* KCSAN/TSAN out-of-line */
+	"__tsan_func_entry",
+	"__tsan_func_exit",
+	"__tsan_read_range",
+	"__tsan_read1",
+	"__tsan_read2",
+	"__tsan_read4",
+	"__tsan_read8",
+	"__tsan_read16",
+	"__tsan_write1",
+	"__tsan_write2",
+	"__tsan_write4",
+	"__tsan_write8",
+	"__tsan_write16",
 	/* KCOV */
 	"write_comp_data",
 	"__sanitizer_cov_trace_pc",
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 2/8] objtool, kcsan: Add KCSAN runtime functions to whitelist
@ 2019-10-17 14:12   ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:12 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

This patch adds KCSAN runtime functions to the objtool whitelist.

Signed-off-by: Marco Elver <elver@google.com>
---
 tools/objtool/check.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 044c9a3cb247..d1acc867b43c 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -466,6 +466,23 @@ static const char *uaccess_safe_builtin[] = {
 	"__asan_report_store4_noabort",
 	"__asan_report_store8_noabort",
 	"__asan_report_store16_noabort",
+	/* KCSAN */
+	"__kcsan_check_watchpoint",
+	"__kcsan_setup_watchpoint",
+	/* KCSAN/TSAN out-of-line */
+	"__tsan_func_entry",
+	"__tsan_func_exit",
+	"__tsan_read_range",
+	"__tsan_read1",
+	"__tsan_read2",
+	"__tsan_read4",
+	"__tsan_read8",
+	"__tsan_read16",
+	"__tsan_write1",
+	"__tsan_write2",
+	"__tsan_write4",
+	"__tsan_write8",
+	"__tsan_write16",
 	/* KCOV */
 	"write_comp_data",
 	"__sanitizer_cov_trace_pc",
-- 
2.23.0.866.gb869b98d4c-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 3/8] build, kcsan: Add KCSAN build exceptions
  2019-10-17 14:12 ` Marco Elver
@ 2019-10-17 14:13   ` Marco Elver
  -1 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:13 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

This blacklists several compilation units from KCSAN. See the respective
inline comments for the reasoning.

Signed-off-by: Marco Elver <elver@google.com>
---
 kernel/Makefile       | 5 +++++
 kernel/sched/Makefile | 6 ++++++
 mm/Makefile           | 8 ++++++++
 3 files changed, 19 insertions(+)

diff --git a/kernel/Makefile b/kernel/Makefile
index 74ab46e2ebd1..4a597a68b8bc 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -23,6 +23,9 @@ endif
 # Prevents flicker of uninteresting __do_softirq()/__local_bh_disable_ip()
 # in coverage traces.
 KCOV_INSTRUMENT_softirq.o := n
+# Avoid KCSAN instrumentation in softirq ("No shared variables, all the data
+# are CPU local" => assume no data-races), to reduce overhead in interrupts.
+KCSAN_SANITIZE_softirq.o = n
 # These are called from save_stack_trace() on slub debug path,
 # and produce insane amounts of uninteresting coverage.
 KCOV_INSTRUMENT_module.o := n
@@ -30,6 +33,7 @@ KCOV_INSTRUMENT_extable.o := n
 # Don't self-instrument.
 KCOV_INSTRUMENT_kcov.o := n
 KASAN_SANITIZE_kcov.o := n
+KCSAN_SANITIZE_kcov.o := n
 CFLAGS_kcov.o := $(call cc-option, -fno-conserve-stack -fno-stack-protector)
 
 # cond_syscall is currently not LTO compatible
@@ -118,6 +122,7 @@ obj-$(CONFIG_RSEQ) += rseq.o
 
 obj-$(CONFIG_GCC_PLUGIN_STACKLEAK) += stackleak.o
 KASAN_SANITIZE_stackleak.o := n
+KCSAN_SANITIZE_stackleak.o := n
 KCOV_INSTRUMENT_stackleak.o := n
 
 $(obj)/configs.o: $(obj)/config_data.gz
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 21fb5a5662b5..e9307a9c54e7 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -7,6 +7,12 @@ endif
 # that is not a function of syscall inputs. E.g. involuntary context switches.
 KCOV_INSTRUMENT := n
 
+# There are numerous races here, however, most of them due to plain accesses.
+# This would make it even harder for syzbot to find reproducers, because these
+# bugs trigger without specific input. Disable by default, but should re-enable
+# eventually.
+KCSAN_SANITIZE := n
+
 ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
 # needed for x86 only.  Why this used to be enabled for all architectures is beyond
diff --git a/mm/Makefile b/mm/Makefile
index d996846697ef..33ea0154dd2d 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -7,6 +7,14 @@ KASAN_SANITIZE_slab_common.o := n
 KASAN_SANITIZE_slab.o := n
 KASAN_SANITIZE_slub.o := n
 
+# These produce frequent data-race reports: most of them are due to races on
+# the same word but accesses to different bits of that word. Re-enable KCSAN
+# for these when we have more consensus on what to do about them.
+KCSAN_SANITIZE_slab_common.o := n
+KCSAN_SANITIZE_slab.o := n
+KCSAN_SANITIZE_slub.o := n
+KCSAN_SANITIZE_page_alloc.o := n
+
 # These files are disabled because they produce non-interesting and/or
 # flaky coverage that is not a function of syscall inputs. E.g. slab is out of
 # free pages, or a task is migrated between nodes.
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 3/8] build, kcsan: Add KCSAN build exceptions
@ 2019-10-17 14:13   ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:13 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

This blacklists several compilation units from KCSAN. See the respective
inline comments for the reasoning.

Signed-off-by: Marco Elver <elver@google.com>
---
 kernel/Makefile       | 5 +++++
 kernel/sched/Makefile | 6 ++++++
 mm/Makefile           | 8 ++++++++
 3 files changed, 19 insertions(+)

diff --git a/kernel/Makefile b/kernel/Makefile
index 74ab46e2ebd1..4a597a68b8bc 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -23,6 +23,9 @@ endif
 # Prevents flicker of uninteresting __do_softirq()/__local_bh_disable_ip()
 # in coverage traces.
 KCOV_INSTRUMENT_softirq.o := n
+# Avoid KCSAN instrumentation in softirq ("No shared variables, all the data
+# are CPU local" => assume no data-races), to reduce overhead in interrupts.
+KCSAN_SANITIZE_softirq.o = n
 # These are called from save_stack_trace() on slub debug path,
 # and produce insane amounts of uninteresting coverage.
 KCOV_INSTRUMENT_module.o := n
@@ -30,6 +33,7 @@ KCOV_INSTRUMENT_extable.o := n
 # Don't self-instrument.
 KCOV_INSTRUMENT_kcov.o := n
 KASAN_SANITIZE_kcov.o := n
+KCSAN_SANITIZE_kcov.o := n
 CFLAGS_kcov.o := $(call cc-option, -fno-conserve-stack -fno-stack-protector)
 
 # cond_syscall is currently not LTO compatible
@@ -118,6 +122,7 @@ obj-$(CONFIG_RSEQ) += rseq.o
 
 obj-$(CONFIG_GCC_PLUGIN_STACKLEAK) += stackleak.o
 KASAN_SANITIZE_stackleak.o := n
+KCSAN_SANITIZE_stackleak.o := n
 KCOV_INSTRUMENT_stackleak.o := n
 
 $(obj)/configs.o: $(obj)/config_data.gz
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 21fb5a5662b5..e9307a9c54e7 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -7,6 +7,12 @@ endif
 # that is not a function of syscall inputs. E.g. involuntary context switches.
 KCOV_INSTRUMENT := n
 
+# There are numerous races here, however, most of them due to plain accesses.
+# This would make it even harder for syzbot to find reproducers, because these
+# bugs trigger without specific input. Disable by default, but should re-enable
+# eventually.
+KCSAN_SANITIZE := n
+
 ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
 # needed for x86 only.  Why this used to be enabled for all architectures is beyond
diff --git a/mm/Makefile b/mm/Makefile
index d996846697ef..33ea0154dd2d 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -7,6 +7,14 @@ KASAN_SANITIZE_slab_common.o := n
 KASAN_SANITIZE_slab.o := n
 KASAN_SANITIZE_slub.o := n
 
+# These produce frequent data-race reports: most of them are due to races on
+# the same word but accesses to different bits of that word. Re-enable KCSAN
+# for these when we have more consensus on what to do about them.
+KCSAN_SANITIZE_slab_common.o := n
+KCSAN_SANITIZE_slab.o := n
+KCSAN_SANITIZE_slub.o := n
+KCSAN_SANITIZE_page_alloc.o := n
+
 # These files are disabled because they produce non-interesting and/or
 # flaky coverage that is not a function of syscall inputs. E.g. slab is out of
 # free pages, or a task is migrated between nodes.
-- 
2.23.0.866.gb869b98d4c-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 4/8] seqlock, kcsan: Add annotations for KCSAN
  2019-10-17 14:12 ` Marco Elver
@ 2019-10-17 14:13   ` Marco Elver
  -1 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:13 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

Since seqlocks in the Linux kernel do not require the use of marked
atomic accesses in critical sections, we teach KCSAN to assume such
accesses are atomic. KCSAN currently also pretends that writes to
`sequence` are atomic, although currently plain writes are used (their
corresponding reads are READ_ONCE).

Further, to avoid false positives in the absence of clear ending of a
seqlock reader critical section (only when using the raw interface),
KCSAN assumes a fixed number of accesses after start of a seqlock
critical section are atomic.

Signed-off-by: Marco Elver <elver@google.com>
---
 include/linux/seqlock.h | 44 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 40 insertions(+), 4 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index bcf4cf26b8c8..1e425831a7ed 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -37,8 +37,24 @@
 #include <linux/preempt.h>
 #include <linux/lockdep.h>
 #include <linux/compiler.h>
+#include <linux/kcsan.h>
 #include <asm/processor.h>
 
+/*
+ * The seqlock interface does not prescribe a precise sequence of read
+ * begin/retry/end. For readers, typically there is a call to
+ * read_seqcount_begin() and read_seqcount_retry(), however, there are more
+ * esoteric cases which do not follow this pattern.
+ *
+ * As a consequence, we take the following best-effort approach for *raw* usage
+ * of seqlocks under KCSAN: upon beginning a seq-reader critical section,
+ * pessimistically mark then next KCSAN_SEQLOCK_REGION_MAX memory accesses as
+ * atomics; if there is a matching read_seqcount_retry() call, no following
+ * memory operations are considered atomic. Non-raw usage of seqlocks is not
+ * affected.
+ */
+#define KCSAN_SEQLOCK_REGION_MAX 1000
+
 /*
  * Version using sequence counter only.
  * This can be used when code has its own mutex protecting the
@@ -115,6 +131,7 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 		cpu_relax();
 		goto repeat;
 	}
+	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
 	return ret;
 }
 
@@ -131,6 +148,7 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
+	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
 	return ret;
 }
 
@@ -183,6 +201,7 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
+	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
 	return ret & ~1;
 }
 
@@ -202,7 +221,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
  */
 static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
-	return unlikely(s->sequence != start);
+	kcsan_atomic_next(0);
+	return unlikely(READ_ONCE(s->sequence) != start);
 }
 
 /**
@@ -225,6 +245,7 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
 
 static inline void raw_write_seqcount_begin(seqcount_t *s)
 {
+	kcsan_begin_atomic(true);
 	s->sequence++;
 	smp_wmb();
 }
@@ -233,6 +254,7 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 {
 	smp_wmb();
 	s->sequence++;
+	kcsan_end_atomic(true);
 }
 
 /**
@@ -262,18 +284,20 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  *
  *      void write(void)
  *      {
- *              Y = true;
+ *              WRITE_ONCE(Y, true);
  *
  *              raw_write_seqcount_barrier(seq);
  *
- *              X = false;
+ *              WRITE_ONCE(X, false);
  *      }
  */
 static inline void raw_write_seqcount_barrier(seqcount_t *s)
 {
+	kcsan_begin_atomic(true);
 	s->sequence++;
 	smp_wmb();
 	s->sequence++;
+	kcsan_end_atomic(true);
 }
 
 static inline int raw_read_seqcount_latch(seqcount_t *s)
@@ -398,7 +422,9 @@ static inline void write_seqcount_end(seqcount_t *s)
 static inline void write_seqcount_invalidate(seqcount_t *s)
 {
 	smp_wmb();
+	kcsan_begin_atomic(true);
 	s->sequence+=2;
+	kcsan_end_atomic(true);
 }
 
 typedef struct {
@@ -430,11 +456,21 @@ typedef struct {
  */
 static inline unsigned read_seqbegin(const seqlock_t *sl)
 {
-	return read_seqcount_begin(&sl->seqcount);
+	unsigned ret = read_seqcount_begin(&sl->seqcount);
+
+	kcsan_atomic_next(0);  /* non-raw usage, assume closing read_seqretry */
+	kcsan_begin_atomic(false);
+	return ret;
 }
 
 static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 {
+	/*
+	 * Assume not nested: read_seqretry may be called multiple times when
+	 * completing read critical section.
+	 */
+	kcsan_end_atomic(false);
+
 	return read_seqcount_retry(&sl->seqcount, start);
 }
 
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 4/8] seqlock, kcsan: Add annotations for KCSAN
@ 2019-10-17 14:13   ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:13 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

Since seqlocks in the Linux kernel do not require the use of marked
atomic accesses in critical sections, we teach KCSAN to assume such
accesses are atomic. KCSAN currently also pretends that writes to
`sequence` are atomic, although currently plain writes are used (their
corresponding reads are READ_ONCE).

Further, to avoid false positives in the absence of clear ending of a
seqlock reader critical section (only when using the raw interface),
KCSAN assumes a fixed number of accesses after start of a seqlock
critical section are atomic.

Signed-off-by: Marco Elver <elver@google.com>
---
 include/linux/seqlock.h | 44 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 40 insertions(+), 4 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index bcf4cf26b8c8..1e425831a7ed 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -37,8 +37,24 @@
 #include <linux/preempt.h>
 #include <linux/lockdep.h>
 #include <linux/compiler.h>
+#include <linux/kcsan.h>
 #include <asm/processor.h>
 
+/*
+ * The seqlock interface does not prescribe a precise sequence of read
+ * begin/retry/end. For readers, typically there is a call to
+ * read_seqcount_begin() and read_seqcount_retry(), however, there are more
+ * esoteric cases which do not follow this pattern.
+ *
+ * As a consequence, we take the following best-effort approach for *raw* usage
+ * of seqlocks under KCSAN: upon beginning a seq-reader critical section,
+ * pessimistically mark then next KCSAN_SEQLOCK_REGION_MAX memory accesses as
+ * atomics; if there is a matching read_seqcount_retry() call, no following
+ * memory operations are considered atomic. Non-raw usage of seqlocks is not
+ * affected.
+ */
+#define KCSAN_SEQLOCK_REGION_MAX 1000
+
 /*
  * Version using sequence counter only.
  * This can be used when code has its own mutex protecting the
@@ -115,6 +131,7 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
 		cpu_relax();
 		goto repeat;
 	}
+	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
 	return ret;
 }
 
@@ -131,6 +148,7 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
+	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
 	return ret;
 }
 
@@ -183,6 +201,7 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
 {
 	unsigned ret = READ_ONCE(s->sequence);
 	smp_rmb();
+	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
 	return ret & ~1;
 }
 
@@ -202,7 +221,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
  */
 static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
 {
-	return unlikely(s->sequence != start);
+	kcsan_atomic_next(0);
+	return unlikely(READ_ONCE(s->sequence) != start);
 }
 
 /**
@@ -225,6 +245,7 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
 
 static inline void raw_write_seqcount_begin(seqcount_t *s)
 {
+	kcsan_begin_atomic(true);
 	s->sequence++;
 	smp_wmb();
 }
@@ -233,6 +254,7 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
 {
 	smp_wmb();
 	s->sequence++;
+	kcsan_end_atomic(true);
 }
 
 /**
@@ -262,18 +284,20 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  *
  *      void write(void)
  *      {
- *              Y = true;
+ *              WRITE_ONCE(Y, true);
  *
  *              raw_write_seqcount_barrier(seq);
  *
- *              X = false;
+ *              WRITE_ONCE(X, false);
  *      }
  */
 static inline void raw_write_seqcount_barrier(seqcount_t *s)
 {
+	kcsan_begin_atomic(true);
 	s->sequence++;
 	smp_wmb();
 	s->sequence++;
+	kcsan_end_atomic(true);
 }
 
 static inline int raw_read_seqcount_latch(seqcount_t *s)
@@ -398,7 +422,9 @@ static inline void write_seqcount_end(seqcount_t *s)
 static inline void write_seqcount_invalidate(seqcount_t *s)
 {
 	smp_wmb();
+	kcsan_begin_atomic(true);
 	s->sequence+=2;
+	kcsan_end_atomic(true);
 }
 
 typedef struct {
@@ -430,11 +456,21 @@ typedef struct {
  */
 static inline unsigned read_seqbegin(const seqlock_t *sl)
 {
-	return read_seqcount_begin(&sl->seqcount);
+	unsigned ret = read_seqcount_begin(&sl->seqcount);
+
+	kcsan_atomic_next(0);  /* non-raw usage, assume closing read_seqretry */
+	kcsan_begin_atomic(false);
+	return ret;
 }
 
 static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 {
+	/*
+	 * Assume not nested: read_seqretry may be called multiple times when
+	 * completing read critical section.
+	 */
+	kcsan_end_atomic(false);
+
 	return read_seqcount_retry(&sl->seqcount, start);
 }
 
-- 
2.23.0.866.gb869b98d4c-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 5/8] seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier
  2019-10-17 14:12 ` Marco Elver
@ 2019-10-17 14:13   ` Marco Elver
  -1 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:13 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

This patch proposes to require marked atomic accesses surrounding
raw_write_seqcount_barrier. We reason that otherwise there is no way to
guarantee propagation nor atomicity of writes before/after the barrier
[1]. For example, consider the compiler tears stores either before or
after the barrier; in this case, readers may observe a partial value,
and because readers are unaware that writes are going on (writes are not
in a seq-writer critical section), will complete the seq-reader critical
section while having observed some partial state.
[1] https://lwn.net/Articles/793253/

This came up when designing and implementing KCSAN, because KCSAN would
flag these accesses as data-races. After careful analysis, our reasoning
as above led us to conclude that the best thing to do is to propose an
amendment to the raw_seqcount_barrier usage.

Signed-off-by: Marco Elver <elver@google.com>
---
 include/linux/seqlock.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 1e425831a7ed..5d50aad53b47 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -265,6 +265,13 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  * usual consistency guarantee. It is one wmb cheaper, because we can
  * collapse the two back-to-back wmb()s.
  *
+ * Note that, writes surrounding the barrier should be declared atomic (e.g.
+ * via WRITE_ONCE): a) to ensure the writes become visible to other threads
+ * atomically, avoiding compiler optimizations; b) to document which writes are
+ * meant to propagate to the reader critical section. This is necessary because
+ * neither writes before and after the barrier are enclosed in a seq-writer
+ * critical section that would ensure readers are aware of ongoing writes.
+ *
  *      seqcount_t seq;
  *      bool X = true, Y = false;
  *
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 5/8] seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier
@ 2019-10-17 14:13   ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:13 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

This patch proposes to require marked atomic accesses surrounding
raw_write_seqcount_barrier. We reason that otherwise there is no way to
guarantee propagation nor atomicity of writes before/after the barrier
[1]. For example, consider the compiler tears stores either before or
after the barrier; in this case, readers may observe a partial value,
and because readers are unaware that writes are going on (writes are not
in a seq-writer critical section), will complete the seq-reader critical
section while having observed some partial state.
[1] https://lwn.net/Articles/793253/

This came up when designing and implementing KCSAN, because KCSAN would
flag these accesses as data-races. After careful analysis, our reasoning
as above led us to conclude that the best thing to do is to propose an
amendment to the raw_seqcount_barrier usage.

Signed-off-by: Marco Elver <elver@google.com>
---
 include/linux/seqlock.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 1e425831a7ed..5d50aad53b47 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -265,6 +265,13 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  * usual consistency guarantee. It is one wmb cheaper, because we can
  * collapse the two back-to-back wmb()s.
  *
+ * Note that, writes surrounding the barrier should be declared atomic (e.g.
+ * via WRITE_ONCE): a) to ensure the writes become visible to other threads
+ * atomically, avoiding compiler optimizations; b) to document which writes are
+ * meant to propagate to the reader critical section. This is necessary because
+ * neither writes before and after the barrier are enclosed in a seq-writer
+ * critical section that would ensure readers are aware of ongoing writes.
+ *
  *      seqcount_t seq;
  *      bool X = true, Y = false;
  *
-- 
2.23.0.866.gb869b98d4c-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 6/8] asm-generic, kcsan: Add KCSAN instrumentation for bitops
  2019-10-17 14:12 ` Marco Elver
@ 2019-10-17 14:13   ` Marco Elver
  -1 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:13 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

Add explicit KCSAN checks for bitops.

Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* Use kcsan_check{,_atomic}_{read,write} instead of
  kcsan_check_{access,atomic}.
---
 include/asm-generic/bitops-instrumented.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/asm-generic/bitops-instrumented.h b/include/asm-generic/bitops-instrumented.h
index ddd1c6d9d8db..864d707cdb87 100644
--- a/include/asm-generic/bitops-instrumented.h
+++ b/include/asm-generic/bitops-instrumented.h
@@ -12,6 +12,7 @@
 #define _ASM_GENERIC_BITOPS_INSTRUMENTED_H
 
 #include <linux/kasan-checks.h>
+#include <linux/kcsan-checks.h>
 
 /**
  * set_bit - Atomically set a bit in memory
@@ -26,6 +27,7 @@
 static inline void set_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	arch_set_bit(nr, addr);
 }
 
@@ -41,6 +43,7 @@ static inline void set_bit(long nr, volatile unsigned long *addr)
 static inline void __set_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
 }
 
@@ -54,6 +57,7 @@ static inline void __set_bit(long nr, volatile unsigned long *addr)
 static inline void clear_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	arch_clear_bit(nr, addr);
 }
 
@@ -69,6 +73,7 @@ static inline void clear_bit(long nr, volatile unsigned long *addr)
 static inline void __clear_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
 }
 
@@ -82,6 +87,7 @@ static inline void __clear_bit(long nr, volatile unsigned long *addr)
 static inline void clear_bit_unlock(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	arch_clear_bit_unlock(nr, addr);
 }
 
@@ -97,6 +103,7 @@ static inline void clear_bit_unlock(long nr, volatile unsigned long *addr)
 static inline void __clear_bit_unlock(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit_unlock(nr, addr);
 }
 
@@ -113,6 +120,7 @@ static inline void __clear_bit_unlock(long nr, volatile unsigned long *addr)
 static inline void change_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	arch_change_bit(nr, addr);
 }
 
@@ -128,6 +136,7 @@ static inline void change_bit(long nr, volatile unsigned long *addr)
 static inline void __change_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
 }
 
@@ -141,6 +150,7 @@ static inline void __change_bit(long nr, volatile unsigned long *addr)
 static inline bool test_and_set_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_and_set_bit(nr, addr);
 }
 
@@ -155,6 +165,7 @@ static inline bool test_and_set_bit(long nr, volatile unsigned long *addr)
 static inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch___test_and_set_bit(nr, addr);
 }
 
@@ -170,6 +181,7 @@ static inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
 static inline bool test_and_set_bit_lock(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_and_set_bit_lock(nr, addr);
 }
 
@@ -183,6 +195,7 @@ static inline bool test_and_set_bit_lock(long nr, volatile unsigned long *addr)
 static inline bool test_and_clear_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_and_clear_bit(nr, addr);
 }
 
@@ -197,6 +210,7 @@ static inline bool test_and_clear_bit(long nr, volatile unsigned long *addr)
 static inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch___test_and_clear_bit(nr, addr);
 }
 
@@ -210,6 +224,7 @@ static inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
 static inline bool test_and_change_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_and_change_bit(nr, addr);
 }
 
@@ -224,6 +239,7 @@ static inline bool test_and_change_bit(long nr, volatile unsigned long *addr)
 static inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch___test_and_change_bit(nr, addr);
 }
 
@@ -235,6 +251,7 @@ static inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
 static inline bool test_bit(long nr, const volatile unsigned long *addr)
 {
 	kasan_check_read(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
 }
 
@@ -254,6 +271,7 @@ static inline bool
 clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch_clear_bit_unlock_is_negative_byte(nr, addr);
 }
 /* Let everybody know we have it. */
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 6/8] asm-generic, kcsan: Add KCSAN instrumentation for bitops
@ 2019-10-17 14:13   ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:13 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

Add explicit KCSAN checks for bitops.

Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* Use kcsan_check{,_atomic}_{read,write} instead of
  kcsan_check_{access,atomic}.
---
 include/asm-generic/bitops-instrumented.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/asm-generic/bitops-instrumented.h b/include/asm-generic/bitops-instrumented.h
index ddd1c6d9d8db..864d707cdb87 100644
--- a/include/asm-generic/bitops-instrumented.h
+++ b/include/asm-generic/bitops-instrumented.h
@@ -12,6 +12,7 @@
 #define _ASM_GENERIC_BITOPS_INSTRUMENTED_H
 
 #include <linux/kasan-checks.h>
+#include <linux/kcsan-checks.h>
 
 /**
  * set_bit - Atomically set a bit in memory
@@ -26,6 +27,7 @@
 static inline void set_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	arch_set_bit(nr, addr);
 }
 
@@ -41,6 +43,7 @@ static inline void set_bit(long nr, volatile unsigned long *addr)
 static inline void __set_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___set_bit(nr, addr);
 }
 
@@ -54,6 +57,7 @@ static inline void __set_bit(long nr, volatile unsigned long *addr)
 static inline void clear_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	arch_clear_bit(nr, addr);
 }
 
@@ -69,6 +73,7 @@ static inline void clear_bit(long nr, volatile unsigned long *addr)
 static inline void __clear_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit(nr, addr);
 }
 
@@ -82,6 +87,7 @@ static inline void __clear_bit(long nr, volatile unsigned long *addr)
 static inline void clear_bit_unlock(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	arch_clear_bit_unlock(nr, addr);
 }
 
@@ -97,6 +103,7 @@ static inline void clear_bit_unlock(long nr, volatile unsigned long *addr)
 static inline void __clear_bit_unlock(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___clear_bit_unlock(nr, addr);
 }
 
@@ -113,6 +120,7 @@ static inline void __clear_bit_unlock(long nr, volatile unsigned long *addr)
 static inline void change_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	arch_change_bit(nr, addr);
 }
 
@@ -128,6 +136,7 @@ static inline void change_bit(long nr, volatile unsigned long *addr)
 static inline void __change_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_write(addr + BIT_WORD(nr), sizeof(long));
 	arch___change_bit(nr, addr);
 }
 
@@ -141,6 +150,7 @@ static inline void __change_bit(long nr, volatile unsigned long *addr)
 static inline bool test_and_set_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_and_set_bit(nr, addr);
 }
 
@@ -155,6 +165,7 @@ static inline bool test_and_set_bit(long nr, volatile unsigned long *addr)
 static inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch___test_and_set_bit(nr, addr);
 }
 
@@ -170,6 +181,7 @@ static inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
 static inline bool test_and_set_bit_lock(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_and_set_bit_lock(nr, addr);
 }
 
@@ -183,6 +195,7 @@ static inline bool test_and_set_bit_lock(long nr, volatile unsigned long *addr)
 static inline bool test_and_clear_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_and_clear_bit(nr, addr);
 }
 
@@ -197,6 +210,7 @@ static inline bool test_and_clear_bit(long nr, volatile unsigned long *addr)
 static inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch___test_and_clear_bit(nr, addr);
 }
 
@@ -210,6 +224,7 @@ static inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
 static inline bool test_and_change_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_and_change_bit(nr, addr);
 }
 
@@ -224,6 +239,7 @@ static inline bool test_and_change_bit(long nr, volatile unsigned long *addr)
 static inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch___test_and_change_bit(nr, addr);
 }
 
@@ -235,6 +251,7 @@ static inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
 static inline bool test_bit(long nr, const volatile unsigned long *addr)
 {
 	kasan_check_read(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_read(addr + BIT_WORD(nr), sizeof(long));
 	return arch_test_bit(nr, addr);
 }
 
@@ -254,6 +271,7 @@ static inline bool
 clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr)
 {
 	kasan_check_write(addr + BIT_WORD(nr), sizeof(long));
+	kcsan_check_atomic_write(addr + BIT_WORD(nr), sizeof(long));
 	return arch_clear_bit_unlock_is_negative_byte(nr, addr);
 }
 /* Let everybody know we have it. */
-- 
2.23.0.866.gb869b98d4c-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 7/8] locking/atomics, kcsan: Add KCSAN instrumentation
  2019-10-17 14:12 ` Marco Elver
@ 2019-10-17 14:13   ` Marco Elver
  -1 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:13 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

This adds KCSAN instrumentation to atomic-instrumented.h.

Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* Use kcsan_check{,_atomic}_{read,write} instead of
  kcsan_check_{access,atomic}.
* Introduce __atomic_check_{read,write} [Suggested by Mark Rutland].
---
 include/asm-generic/atomic-instrumented.h | 393 +++++++++++-----------
 scripts/atomic/gen-atomic-instrumented.sh |  17 +-
 2 files changed, 218 insertions(+), 192 deletions(-)

diff --git a/include/asm-generic/atomic-instrumented.h b/include/asm-generic/atomic-instrumented.h
index e8730c6b9fe2..3dc0f38544f6 100644
--- a/include/asm-generic/atomic-instrumented.h
+++ b/include/asm-generic/atomic-instrumented.h
@@ -19,11 +19,24 @@
 
 #include <linux/build_bug.h>
 #include <linux/kasan-checks.h>
+#include <linux/kcsan-checks.h>
+
+static inline void __atomic_check_read(const volatile void *v, size_t size)
+{
+	kasan_check_read(v, size);
+	kcsan_check_atomic_read(v, size);
+}
+
+static inline void __atomic_check_write(const volatile void *v, size_t size)
+{
+	kasan_check_write(v, size);
+	kcsan_check_atomic_write(v, size);
+}
 
 static inline int
 atomic_read(const atomic_t *v)
 {
-	kasan_check_read(v, sizeof(*v));
+	__atomic_check_read(v, sizeof(*v));
 	return arch_atomic_read(v);
 }
 #define atomic_read atomic_read
@@ -32,7 +45,7 @@ atomic_read(const atomic_t *v)
 static inline int
 atomic_read_acquire(const atomic_t *v)
 {
-	kasan_check_read(v, sizeof(*v));
+	__atomic_check_read(v, sizeof(*v));
 	return arch_atomic_read_acquire(v);
 }
 #define atomic_read_acquire atomic_read_acquire
@@ -41,7 +54,7 @@ atomic_read_acquire(const atomic_t *v)
 static inline void
 atomic_set(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_set(v, i);
 }
 #define atomic_set atomic_set
@@ -50,7 +63,7 @@ atomic_set(atomic_t *v, int i)
 static inline void
 atomic_set_release(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_set_release(v, i);
 }
 #define atomic_set_release atomic_set_release
@@ -59,7 +72,7 @@ atomic_set_release(atomic_t *v, int i)
 static inline void
 atomic_add(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_add(i, v);
 }
 #define atomic_add atomic_add
@@ -68,7 +81,7 @@ atomic_add(int i, atomic_t *v)
 static inline int
 atomic_add_return(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_return(i, v);
 }
 #define atomic_add_return atomic_add_return
@@ -78,7 +91,7 @@ atomic_add_return(int i, atomic_t *v)
 static inline int
 atomic_add_return_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_return_acquire(i, v);
 }
 #define atomic_add_return_acquire atomic_add_return_acquire
@@ -88,7 +101,7 @@ atomic_add_return_acquire(int i, atomic_t *v)
 static inline int
 atomic_add_return_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_return_release(i, v);
 }
 #define atomic_add_return_release atomic_add_return_release
@@ -98,7 +111,7 @@ atomic_add_return_release(int i, atomic_t *v)
 static inline int
 atomic_add_return_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_return_relaxed(i, v);
 }
 #define atomic_add_return_relaxed atomic_add_return_relaxed
@@ -108,7 +121,7 @@ atomic_add_return_relaxed(int i, atomic_t *v)
 static inline int
 atomic_fetch_add(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_add(i, v);
 }
 #define atomic_fetch_add atomic_fetch_add
@@ -118,7 +131,7 @@ atomic_fetch_add(int i, atomic_t *v)
 static inline int
 atomic_fetch_add_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_add_acquire(i, v);
 }
 #define atomic_fetch_add_acquire atomic_fetch_add_acquire
@@ -128,7 +141,7 @@ atomic_fetch_add_acquire(int i, atomic_t *v)
 static inline int
 atomic_fetch_add_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_add_release(i, v);
 }
 #define atomic_fetch_add_release atomic_fetch_add_release
@@ -138,7 +151,7 @@ atomic_fetch_add_release(int i, atomic_t *v)
 static inline int
 atomic_fetch_add_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_add_relaxed(i, v);
 }
 #define atomic_fetch_add_relaxed atomic_fetch_add_relaxed
@@ -147,7 +160,7 @@ atomic_fetch_add_relaxed(int i, atomic_t *v)
 static inline void
 atomic_sub(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_sub(i, v);
 }
 #define atomic_sub atomic_sub
@@ -156,7 +169,7 @@ atomic_sub(int i, atomic_t *v)
 static inline int
 atomic_sub_return(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_sub_return(i, v);
 }
 #define atomic_sub_return atomic_sub_return
@@ -166,7 +179,7 @@ atomic_sub_return(int i, atomic_t *v)
 static inline int
 atomic_sub_return_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_sub_return_acquire(i, v);
 }
 #define atomic_sub_return_acquire atomic_sub_return_acquire
@@ -176,7 +189,7 @@ atomic_sub_return_acquire(int i, atomic_t *v)
 static inline int
 atomic_sub_return_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_sub_return_release(i, v);
 }
 #define atomic_sub_return_release atomic_sub_return_release
@@ -186,7 +199,7 @@ atomic_sub_return_release(int i, atomic_t *v)
 static inline int
 atomic_sub_return_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_sub_return_relaxed(i, v);
 }
 #define atomic_sub_return_relaxed atomic_sub_return_relaxed
@@ -196,7 +209,7 @@ atomic_sub_return_relaxed(int i, atomic_t *v)
 static inline int
 atomic_fetch_sub(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_sub(i, v);
 }
 #define atomic_fetch_sub atomic_fetch_sub
@@ -206,7 +219,7 @@ atomic_fetch_sub(int i, atomic_t *v)
 static inline int
 atomic_fetch_sub_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_sub_acquire(i, v);
 }
 #define atomic_fetch_sub_acquire atomic_fetch_sub_acquire
@@ -216,7 +229,7 @@ atomic_fetch_sub_acquire(int i, atomic_t *v)
 static inline int
 atomic_fetch_sub_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_sub_release(i, v);
 }
 #define atomic_fetch_sub_release atomic_fetch_sub_release
@@ -226,7 +239,7 @@ atomic_fetch_sub_release(int i, atomic_t *v)
 static inline int
 atomic_fetch_sub_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_sub_relaxed(i, v);
 }
 #define atomic_fetch_sub_relaxed atomic_fetch_sub_relaxed
@@ -236,7 +249,7 @@ atomic_fetch_sub_relaxed(int i, atomic_t *v)
 static inline void
 atomic_inc(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_inc(v);
 }
 #define atomic_inc atomic_inc
@@ -246,7 +259,7 @@ atomic_inc(atomic_t *v)
 static inline int
 atomic_inc_return(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_return(v);
 }
 #define atomic_inc_return atomic_inc_return
@@ -256,7 +269,7 @@ atomic_inc_return(atomic_t *v)
 static inline int
 atomic_inc_return_acquire(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_return_acquire(v);
 }
 #define atomic_inc_return_acquire atomic_inc_return_acquire
@@ -266,7 +279,7 @@ atomic_inc_return_acquire(atomic_t *v)
 static inline int
 atomic_inc_return_release(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_return_release(v);
 }
 #define atomic_inc_return_release atomic_inc_return_release
@@ -276,7 +289,7 @@ atomic_inc_return_release(atomic_t *v)
 static inline int
 atomic_inc_return_relaxed(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_return_relaxed(v);
 }
 #define atomic_inc_return_relaxed atomic_inc_return_relaxed
@@ -286,7 +299,7 @@ atomic_inc_return_relaxed(atomic_t *v)
 static inline int
 atomic_fetch_inc(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_inc(v);
 }
 #define atomic_fetch_inc atomic_fetch_inc
@@ -296,7 +309,7 @@ atomic_fetch_inc(atomic_t *v)
 static inline int
 atomic_fetch_inc_acquire(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_inc_acquire(v);
 }
 #define atomic_fetch_inc_acquire atomic_fetch_inc_acquire
@@ -306,7 +319,7 @@ atomic_fetch_inc_acquire(atomic_t *v)
 static inline int
 atomic_fetch_inc_release(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_inc_release(v);
 }
 #define atomic_fetch_inc_release atomic_fetch_inc_release
@@ -316,7 +329,7 @@ atomic_fetch_inc_release(atomic_t *v)
 static inline int
 atomic_fetch_inc_relaxed(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_inc_relaxed(v);
 }
 #define atomic_fetch_inc_relaxed atomic_fetch_inc_relaxed
@@ -326,7 +339,7 @@ atomic_fetch_inc_relaxed(atomic_t *v)
 static inline void
 atomic_dec(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_dec(v);
 }
 #define atomic_dec atomic_dec
@@ -336,7 +349,7 @@ atomic_dec(atomic_t *v)
 static inline int
 atomic_dec_return(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_return(v);
 }
 #define atomic_dec_return atomic_dec_return
@@ -346,7 +359,7 @@ atomic_dec_return(atomic_t *v)
 static inline int
 atomic_dec_return_acquire(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_return_acquire(v);
 }
 #define atomic_dec_return_acquire atomic_dec_return_acquire
@@ -356,7 +369,7 @@ atomic_dec_return_acquire(atomic_t *v)
 static inline int
 atomic_dec_return_release(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_return_release(v);
 }
 #define atomic_dec_return_release atomic_dec_return_release
@@ -366,7 +379,7 @@ atomic_dec_return_release(atomic_t *v)
 static inline int
 atomic_dec_return_relaxed(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_return_relaxed(v);
 }
 #define atomic_dec_return_relaxed atomic_dec_return_relaxed
@@ -376,7 +389,7 @@ atomic_dec_return_relaxed(atomic_t *v)
 static inline int
 atomic_fetch_dec(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_dec(v);
 }
 #define atomic_fetch_dec atomic_fetch_dec
@@ -386,7 +399,7 @@ atomic_fetch_dec(atomic_t *v)
 static inline int
 atomic_fetch_dec_acquire(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_dec_acquire(v);
 }
 #define atomic_fetch_dec_acquire atomic_fetch_dec_acquire
@@ -396,7 +409,7 @@ atomic_fetch_dec_acquire(atomic_t *v)
 static inline int
 atomic_fetch_dec_release(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_dec_release(v);
 }
 #define atomic_fetch_dec_release atomic_fetch_dec_release
@@ -406,7 +419,7 @@ atomic_fetch_dec_release(atomic_t *v)
 static inline int
 atomic_fetch_dec_relaxed(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_dec_relaxed(v);
 }
 #define atomic_fetch_dec_relaxed atomic_fetch_dec_relaxed
@@ -415,7 +428,7 @@ atomic_fetch_dec_relaxed(atomic_t *v)
 static inline void
 atomic_and(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_and(i, v);
 }
 #define atomic_and atomic_and
@@ -424,7 +437,7 @@ atomic_and(int i, atomic_t *v)
 static inline int
 atomic_fetch_and(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_and(i, v);
 }
 #define atomic_fetch_and atomic_fetch_and
@@ -434,7 +447,7 @@ atomic_fetch_and(int i, atomic_t *v)
 static inline int
 atomic_fetch_and_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_and_acquire(i, v);
 }
 #define atomic_fetch_and_acquire atomic_fetch_and_acquire
@@ -444,7 +457,7 @@ atomic_fetch_and_acquire(int i, atomic_t *v)
 static inline int
 atomic_fetch_and_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_and_release(i, v);
 }
 #define atomic_fetch_and_release atomic_fetch_and_release
@@ -454,7 +467,7 @@ atomic_fetch_and_release(int i, atomic_t *v)
 static inline int
 atomic_fetch_and_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_and_relaxed(i, v);
 }
 #define atomic_fetch_and_relaxed atomic_fetch_and_relaxed
@@ -464,7 +477,7 @@ atomic_fetch_and_relaxed(int i, atomic_t *v)
 static inline void
 atomic_andnot(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_andnot(i, v);
 }
 #define atomic_andnot atomic_andnot
@@ -474,7 +487,7 @@ atomic_andnot(int i, atomic_t *v)
 static inline int
 atomic_fetch_andnot(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_andnot(i, v);
 }
 #define atomic_fetch_andnot atomic_fetch_andnot
@@ -484,7 +497,7 @@ atomic_fetch_andnot(int i, atomic_t *v)
 static inline int
 atomic_fetch_andnot_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_andnot_acquire(i, v);
 }
 #define atomic_fetch_andnot_acquire atomic_fetch_andnot_acquire
@@ -494,7 +507,7 @@ atomic_fetch_andnot_acquire(int i, atomic_t *v)
 static inline int
 atomic_fetch_andnot_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_andnot_release(i, v);
 }
 #define atomic_fetch_andnot_release atomic_fetch_andnot_release
@@ -504,7 +517,7 @@ atomic_fetch_andnot_release(int i, atomic_t *v)
 static inline int
 atomic_fetch_andnot_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_andnot_relaxed(i, v);
 }
 #define atomic_fetch_andnot_relaxed atomic_fetch_andnot_relaxed
@@ -513,7 +526,7 @@ atomic_fetch_andnot_relaxed(int i, atomic_t *v)
 static inline void
 atomic_or(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_or(i, v);
 }
 #define atomic_or atomic_or
@@ -522,7 +535,7 @@ atomic_or(int i, atomic_t *v)
 static inline int
 atomic_fetch_or(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_or(i, v);
 }
 #define atomic_fetch_or atomic_fetch_or
@@ -532,7 +545,7 @@ atomic_fetch_or(int i, atomic_t *v)
 static inline int
 atomic_fetch_or_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_or_acquire(i, v);
 }
 #define atomic_fetch_or_acquire atomic_fetch_or_acquire
@@ -542,7 +555,7 @@ atomic_fetch_or_acquire(int i, atomic_t *v)
 static inline int
 atomic_fetch_or_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_or_release(i, v);
 }
 #define atomic_fetch_or_release atomic_fetch_or_release
@@ -552,7 +565,7 @@ atomic_fetch_or_release(int i, atomic_t *v)
 static inline int
 atomic_fetch_or_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_or_relaxed(i, v);
 }
 #define atomic_fetch_or_relaxed atomic_fetch_or_relaxed
@@ -561,7 +574,7 @@ atomic_fetch_or_relaxed(int i, atomic_t *v)
 static inline void
 atomic_xor(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_xor(i, v);
 }
 #define atomic_xor atomic_xor
@@ -570,7 +583,7 @@ atomic_xor(int i, atomic_t *v)
 static inline int
 atomic_fetch_xor(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_xor(i, v);
 }
 #define atomic_fetch_xor atomic_fetch_xor
@@ -580,7 +593,7 @@ atomic_fetch_xor(int i, atomic_t *v)
 static inline int
 atomic_fetch_xor_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_xor_acquire(i, v);
 }
 #define atomic_fetch_xor_acquire atomic_fetch_xor_acquire
@@ -590,7 +603,7 @@ atomic_fetch_xor_acquire(int i, atomic_t *v)
 static inline int
 atomic_fetch_xor_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_xor_release(i, v);
 }
 #define atomic_fetch_xor_release atomic_fetch_xor_release
@@ -600,7 +613,7 @@ atomic_fetch_xor_release(int i, atomic_t *v)
 static inline int
 atomic_fetch_xor_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_xor_relaxed(i, v);
 }
 #define atomic_fetch_xor_relaxed atomic_fetch_xor_relaxed
@@ -610,7 +623,7 @@ atomic_fetch_xor_relaxed(int i, atomic_t *v)
 static inline int
 atomic_xchg(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_xchg(v, i);
 }
 #define atomic_xchg atomic_xchg
@@ -620,7 +633,7 @@ atomic_xchg(atomic_t *v, int i)
 static inline int
 atomic_xchg_acquire(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_xchg_acquire(v, i);
 }
 #define atomic_xchg_acquire atomic_xchg_acquire
@@ -630,7 +643,7 @@ atomic_xchg_acquire(atomic_t *v, int i)
 static inline int
 atomic_xchg_release(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_xchg_release(v, i);
 }
 #define atomic_xchg_release atomic_xchg_release
@@ -640,7 +653,7 @@ atomic_xchg_release(atomic_t *v, int i)
 static inline int
 atomic_xchg_relaxed(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_xchg_relaxed(v, i);
 }
 #define atomic_xchg_relaxed atomic_xchg_relaxed
@@ -650,7 +663,7 @@ atomic_xchg_relaxed(atomic_t *v, int i)
 static inline int
 atomic_cmpxchg(atomic_t *v, int old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_cmpxchg(v, old, new);
 }
 #define atomic_cmpxchg atomic_cmpxchg
@@ -660,7 +673,7 @@ atomic_cmpxchg(atomic_t *v, int old, int new)
 static inline int
 atomic_cmpxchg_acquire(atomic_t *v, int old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_cmpxchg_acquire(v, old, new);
 }
 #define atomic_cmpxchg_acquire atomic_cmpxchg_acquire
@@ -670,7 +683,7 @@ atomic_cmpxchg_acquire(atomic_t *v, int old, int new)
 static inline int
 atomic_cmpxchg_release(atomic_t *v, int old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_cmpxchg_release(v, old, new);
 }
 #define atomic_cmpxchg_release atomic_cmpxchg_release
@@ -680,7 +693,7 @@ atomic_cmpxchg_release(atomic_t *v, int old, int new)
 static inline int
 atomic_cmpxchg_relaxed(atomic_t *v, int old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_cmpxchg_relaxed(v, old, new);
 }
 #define atomic_cmpxchg_relaxed atomic_cmpxchg_relaxed
@@ -690,8 +703,8 @@ atomic_cmpxchg_relaxed(atomic_t *v, int old, int new)
 static inline bool
 atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg(v, old, new);
 }
 #define atomic_try_cmpxchg atomic_try_cmpxchg
@@ -701,8 +714,8 @@ atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 static inline bool
 atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg_acquire(v, old, new);
 }
 #define atomic_try_cmpxchg_acquire atomic_try_cmpxchg_acquire
@@ -712,8 +725,8 @@ atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
 static inline bool
 atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg_release(v, old, new);
 }
 #define atomic_try_cmpxchg_release atomic_try_cmpxchg_release
@@ -723,8 +736,8 @@ atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
 static inline bool
 atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg_relaxed(v, old, new);
 }
 #define atomic_try_cmpxchg_relaxed atomic_try_cmpxchg_relaxed
@@ -734,7 +747,7 @@ atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
 static inline bool
 atomic_sub_and_test(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_sub_and_test(i, v);
 }
 #define atomic_sub_and_test atomic_sub_and_test
@@ -744,7 +757,7 @@ atomic_sub_and_test(int i, atomic_t *v)
 static inline bool
 atomic_dec_and_test(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_and_test(v);
 }
 #define atomic_dec_and_test atomic_dec_and_test
@@ -754,7 +767,7 @@ atomic_dec_and_test(atomic_t *v)
 static inline bool
 atomic_inc_and_test(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_and_test(v);
 }
 #define atomic_inc_and_test atomic_inc_and_test
@@ -764,7 +777,7 @@ atomic_inc_and_test(atomic_t *v)
 static inline bool
 atomic_add_negative(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_negative(i, v);
 }
 #define atomic_add_negative atomic_add_negative
@@ -774,7 +787,7 @@ atomic_add_negative(int i, atomic_t *v)
 static inline int
 atomic_fetch_add_unless(atomic_t *v, int a, int u)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_add_unless(v, a, u);
 }
 #define atomic_fetch_add_unless atomic_fetch_add_unless
@@ -784,7 +797,7 @@ atomic_fetch_add_unless(atomic_t *v, int a, int u)
 static inline bool
 atomic_add_unless(atomic_t *v, int a, int u)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_unless(v, a, u);
 }
 #define atomic_add_unless atomic_add_unless
@@ -794,7 +807,7 @@ atomic_add_unless(atomic_t *v, int a, int u)
 static inline bool
 atomic_inc_not_zero(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_not_zero(v);
 }
 #define atomic_inc_not_zero atomic_inc_not_zero
@@ -804,7 +817,7 @@ atomic_inc_not_zero(atomic_t *v)
 static inline bool
 atomic_inc_unless_negative(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_unless_negative(v);
 }
 #define atomic_inc_unless_negative atomic_inc_unless_negative
@@ -814,7 +827,7 @@ atomic_inc_unless_negative(atomic_t *v)
 static inline bool
 atomic_dec_unless_positive(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_unless_positive(v);
 }
 #define atomic_dec_unless_positive atomic_dec_unless_positive
@@ -824,7 +837,7 @@ atomic_dec_unless_positive(atomic_t *v)
 static inline int
 atomic_dec_if_positive(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_if_positive(v);
 }
 #define atomic_dec_if_positive atomic_dec_if_positive
@@ -833,7 +846,7 @@ atomic_dec_if_positive(atomic_t *v)
 static inline s64
 atomic64_read(const atomic64_t *v)
 {
-	kasan_check_read(v, sizeof(*v));
+	__atomic_check_read(v, sizeof(*v));
 	return arch_atomic64_read(v);
 }
 #define atomic64_read atomic64_read
@@ -842,7 +855,7 @@ atomic64_read(const atomic64_t *v)
 static inline s64
 atomic64_read_acquire(const atomic64_t *v)
 {
-	kasan_check_read(v, sizeof(*v));
+	__atomic_check_read(v, sizeof(*v));
 	return arch_atomic64_read_acquire(v);
 }
 #define atomic64_read_acquire atomic64_read_acquire
@@ -851,7 +864,7 @@ atomic64_read_acquire(const atomic64_t *v)
 static inline void
 atomic64_set(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_set(v, i);
 }
 #define atomic64_set atomic64_set
@@ -860,7 +873,7 @@ atomic64_set(atomic64_t *v, s64 i)
 static inline void
 atomic64_set_release(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_set_release(v, i);
 }
 #define atomic64_set_release atomic64_set_release
@@ -869,7 +882,7 @@ atomic64_set_release(atomic64_t *v, s64 i)
 static inline void
 atomic64_add(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_add(i, v);
 }
 #define atomic64_add atomic64_add
@@ -878,7 +891,7 @@ atomic64_add(s64 i, atomic64_t *v)
 static inline s64
 atomic64_add_return(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_return(i, v);
 }
 #define atomic64_add_return atomic64_add_return
@@ -888,7 +901,7 @@ atomic64_add_return(s64 i, atomic64_t *v)
 static inline s64
 atomic64_add_return_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_return_acquire(i, v);
 }
 #define atomic64_add_return_acquire atomic64_add_return_acquire
@@ -898,7 +911,7 @@ atomic64_add_return_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_add_return_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_return_release(i, v);
 }
 #define atomic64_add_return_release atomic64_add_return_release
@@ -908,7 +921,7 @@ atomic64_add_return_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_add_return_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_return_relaxed(i, v);
 }
 #define atomic64_add_return_relaxed atomic64_add_return_relaxed
@@ -918,7 +931,7 @@ atomic64_add_return_relaxed(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_add(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add(i, v);
 }
 #define atomic64_fetch_add atomic64_fetch_add
@@ -928,7 +941,7 @@ atomic64_fetch_add(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_add_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add_acquire(i, v);
 }
 #define atomic64_fetch_add_acquire atomic64_fetch_add_acquire
@@ -938,7 +951,7 @@ atomic64_fetch_add_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_add_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add_release(i, v);
 }
 #define atomic64_fetch_add_release atomic64_fetch_add_release
@@ -948,7 +961,7 @@ atomic64_fetch_add_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_add_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add_relaxed(i, v);
 }
 #define atomic64_fetch_add_relaxed atomic64_fetch_add_relaxed
@@ -957,7 +970,7 @@ atomic64_fetch_add_relaxed(s64 i, atomic64_t *v)
 static inline void
 atomic64_sub(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_sub(i, v);
 }
 #define atomic64_sub atomic64_sub
@@ -966,7 +979,7 @@ atomic64_sub(s64 i, atomic64_t *v)
 static inline s64
 atomic64_sub_return(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_sub_return(i, v);
 }
 #define atomic64_sub_return atomic64_sub_return
@@ -976,7 +989,7 @@ atomic64_sub_return(s64 i, atomic64_t *v)
 static inline s64
 atomic64_sub_return_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_sub_return_acquire(i, v);
 }
 #define atomic64_sub_return_acquire atomic64_sub_return_acquire
@@ -986,7 +999,7 @@ atomic64_sub_return_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_sub_return_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_sub_return_release(i, v);
 }
 #define atomic64_sub_return_release atomic64_sub_return_release
@@ -996,7 +1009,7 @@ atomic64_sub_return_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_sub_return_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_sub_return_relaxed(i, v);
 }
 #define atomic64_sub_return_relaxed atomic64_sub_return_relaxed
@@ -1006,7 +1019,7 @@ atomic64_sub_return_relaxed(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_sub(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_sub(i, v);
 }
 #define atomic64_fetch_sub atomic64_fetch_sub
@@ -1016,7 +1029,7 @@ atomic64_fetch_sub(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_sub_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_sub_acquire(i, v);
 }
 #define atomic64_fetch_sub_acquire atomic64_fetch_sub_acquire
@@ -1026,7 +1039,7 @@ atomic64_fetch_sub_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_sub_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_sub_release(i, v);
 }
 #define atomic64_fetch_sub_release atomic64_fetch_sub_release
@@ -1036,7 +1049,7 @@ atomic64_fetch_sub_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_sub_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_sub_relaxed(i, v);
 }
 #define atomic64_fetch_sub_relaxed atomic64_fetch_sub_relaxed
@@ -1046,7 +1059,7 @@ atomic64_fetch_sub_relaxed(s64 i, atomic64_t *v)
 static inline void
 atomic64_inc(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_inc(v);
 }
 #define atomic64_inc atomic64_inc
@@ -1056,7 +1069,7 @@ atomic64_inc(atomic64_t *v)
 static inline s64
 atomic64_inc_return(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_return(v);
 }
 #define atomic64_inc_return atomic64_inc_return
@@ -1066,7 +1079,7 @@ atomic64_inc_return(atomic64_t *v)
 static inline s64
 atomic64_inc_return_acquire(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_return_acquire(v);
 }
 #define atomic64_inc_return_acquire atomic64_inc_return_acquire
@@ -1076,7 +1089,7 @@ atomic64_inc_return_acquire(atomic64_t *v)
 static inline s64
 atomic64_inc_return_release(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_return_release(v);
 }
 #define atomic64_inc_return_release atomic64_inc_return_release
@@ -1086,7 +1099,7 @@ atomic64_inc_return_release(atomic64_t *v)
 static inline s64
 atomic64_inc_return_relaxed(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_return_relaxed(v);
 }
 #define atomic64_inc_return_relaxed atomic64_inc_return_relaxed
@@ -1096,7 +1109,7 @@ atomic64_inc_return_relaxed(atomic64_t *v)
 static inline s64
 atomic64_fetch_inc(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_inc(v);
 }
 #define atomic64_fetch_inc atomic64_fetch_inc
@@ -1106,7 +1119,7 @@ atomic64_fetch_inc(atomic64_t *v)
 static inline s64
 atomic64_fetch_inc_acquire(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_inc_acquire(v);
 }
 #define atomic64_fetch_inc_acquire atomic64_fetch_inc_acquire
@@ -1116,7 +1129,7 @@ atomic64_fetch_inc_acquire(atomic64_t *v)
 static inline s64
 atomic64_fetch_inc_release(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_inc_release(v);
 }
 #define atomic64_fetch_inc_release atomic64_fetch_inc_release
@@ -1126,7 +1139,7 @@ atomic64_fetch_inc_release(atomic64_t *v)
 static inline s64
 atomic64_fetch_inc_relaxed(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_inc_relaxed(v);
 }
 #define atomic64_fetch_inc_relaxed atomic64_fetch_inc_relaxed
@@ -1136,7 +1149,7 @@ atomic64_fetch_inc_relaxed(atomic64_t *v)
 static inline void
 atomic64_dec(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_dec(v);
 }
 #define atomic64_dec atomic64_dec
@@ -1146,7 +1159,7 @@ atomic64_dec(atomic64_t *v)
 static inline s64
 atomic64_dec_return(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_return(v);
 }
 #define atomic64_dec_return atomic64_dec_return
@@ -1156,7 +1169,7 @@ atomic64_dec_return(atomic64_t *v)
 static inline s64
 atomic64_dec_return_acquire(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_return_acquire(v);
 }
 #define atomic64_dec_return_acquire atomic64_dec_return_acquire
@@ -1166,7 +1179,7 @@ atomic64_dec_return_acquire(atomic64_t *v)
 static inline s64
 atomic64_dec_return_release(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_return_release(v);
 }
 #define atomic64_dec_return_release atomic64_dec_return_release
@@ -1176,7 +1189,7 @@ atomic64_dec_return_release(atomic64_t *v)
 static inline s64
 atomic64_dec_return_relaxed(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_return_relaxed(v);
 }
 #define atomic64_dec_return_relaxed atomic64_dec_return_relaxed
@@ -1186,7 +1199,7 @@ atomic64_dec_return_relaxed(atomic64_t *v)
 static inline s64
 atomic64_fetch_dec(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_dec(v);
 }
 #define atomic64_fetch_dec atomic64_fetch_dec
@@ -1196,7 +1209,7 @@ atomic64_fetch_dec(atomic64_t *v)
 static inline s64
 atomic64_fetch_dec_acquire(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_dec_acquire(v);
 }
 #define atomic64_fetch_dec_acquire atomic64_fetch_dec_acquire
@@ -1206,7 +1219,7 @@ atomic64_fetch_dec_acquire(atomic64_t *v)
 static inline s64
 atomic64_fetch_dec_release(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_dec_release(v);
 }
 #define atomic64_fetch_dec_release atomic64_fetch_dec_release
@@ -1216,7 +1229,7 @@ atomic64_fetch_dec_release(atomic64_t *v)
 static inline s64
 atomic64_fetch_dec_relaxed(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_dec_relaxed(v);
 }
 #define atomic64_fetch_dec_relaxed atomic64_fetch_dec_relaxed
@@ -1225,7 +1238,7 @@ atomic64_fetch_dec_relaxed(atomic64_t *v)
 static inline void
 atomic64_and(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_and(i, v);
 }
 #define atomic64_and atomic64_and
@@ -1234,7 +1247,7 @@ atomic64_and(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_and(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_and(i, v);
 }
 #define atomic64_fetch_and atomic64_fetch_and
@@ -1244,7 +1257,7 @@ atomic64_fetch_and(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_and_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_and_acquire(i, v);
 }
 #define atomic64_fetch_and_acquire atomic64_fetch_and_acquire
@@ -1254,7 +1267,7 @@ atomic64_fetch_and_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_and_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_and_release(i, v);
 }
 #define atomic64_fetch_and_release atomic64_fetch_and_release
@@ -1264,7 +1277,7 @@ atomic64_fetch_and_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_and_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_and_relaxed(i, v);
 }
 #define atomic64_fetch_and_relaxed atomic64_fetch_and_relaxed
@@ -1274,7 +1287,7 @@ atomic64_fetch_and_relaxed(s64 i, atomic64_t *v)
 static inline void
 atomic64_andnot(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_andnot(i, v);
 }
 #define atomic64_andnot atomic64_andnot
@@ -1284,7 +1297,7 @@ atomic64_andnot(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_andnot(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_andnot(i, v);
 }
 #define atomic64_fetch_andnot atomic64_fetch_andnot
@@ -1294,7 +1307,7 @@ atomic64_fetch_andnot(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_andnot_acquire(i, v);
 }
 #define atomic64_fetch_andnot_acquire atomic64_fetch_andnot_acquire
@@ -1304,7 +1317,7 @@ atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_andnot_release(i, v);
 }
 #define atomic64_fetch_andnot_release atomic64_fetch_andnot_release
@@ -1314,7 +1327,7 @@ atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_andnot_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_andnot_relaxed(i, v);
 }
 #define atomic64_fetch_andnot_relaxed atomic64_fetch_andnot_relaxed
@@ -1323,7 +1336,7 @@ atomic64_fetch_andnot_relaxed(s64 i, atomic64_t *v)
 static inline void
 atomic64_or(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_or(i, v);
 }
 #define atomic64_or atomic64_or
@@ -1332,7 +1345,7 @@ atomic64_or(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_or(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_or(i, v);
 }
 #define atomic64_fetch_or atomic64_fetch_or
@@ -1342,7 +1355,7 @@ atomic64_fetch_or(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_or_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_or_acquire(i, v);
 }
 #define atomic64_fetch_or_acquire atomic64_fetch_or_acquire
@@ -1352,7 +1365,7 @@ atomic64_fetch_or_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_or_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_or_release(i, v);
 }
 #define atomic64_fetch_or_release atomic64_fetch_or_release
@@ -1362,7 +1375,7 @@ atomic64_fetch_or_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_or_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_or_relaxed(i, v);
 }
 #define atomic64_fetch_or_relaxed atomic64_fetch_or_relaxed
@@ -1371,7 +1384,7 @@ atomic64_fetch_or_relaxed(s64 i, atomic64_t *v)
 static inline void
 atomic64_xor(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_xor(i, v);
 }
 #define atomic64_xor atomic64_xor
@@ -1380,7 +1393,7 @@ atomic64_xor(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_xor(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_xor(i, v);
 }
 #define atomic64_fetch_xor atomic64_fetch_xor
@@ -1390,7 +1403,7 @@ atomic64_fetch_xor(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_xor_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_xor_acquire(i, v);
 }
 #define atomic64_fetch_xor_acquire atomic64_fetch_xor_acquire
@@ -1400,7 +1413,7 @@ atomic64_fetch_xor_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_xor_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_xor_release(i, v);
 }
 #define atomic64_fetch_xor_release atomic64_fetch_xor_release
@@ -1410,7 +1423,7 @@ atomic64_fetch_xor_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_xor_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_xor_relaxed(i, v);
 }
 #define atomic64_fetch_xor_relaxed atomic64_fetch_xor_relaxed
@@ -1420,7 +1433,7 @@ atomic64_fetch_xor_relaxed(s64 i, atomic64_t *v)
 static inline s64
 atomic64_xchg(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_xchg(v, i);
 }
 #define atomic64_xchg atomic64_xchg
@@ -1430,7 +1443,7 @@ atomic64_xchg(atomic64_t *v, s64 i)
 static inline s64
 atomic64_xchg_acquire(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_xchg_acquire(v, i);
 }
 #define atomic64_xchg_acquire atomic64_xchg_acquire
@@ -1440,7 +1453,7 @@ atomic64_xchg_acquire(atomic64_t *v, s64 i)
 static inline s64
 atomic64_xchg_release(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_xchg_release(v, i);
 }
 #define atomic64_xchg_release atomic64_xchg_release
@@ -1450,7 +1463,7 @@ atomic64_xchg_release(atomic64_t *v, s64 i)
 static inline s64
 atomic64_xchg_relaxed(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_xchg_relaxed(v, i);
 }
 #define atomic64_xchg_relaxed atomic64_xchg_relaxed
@@ -1460,7 +1473,7 @@ atomic64_xchg_relaxed(atomic64_t *v, s64 i)
 static inline s64
 atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_cmpxchg(v, old, new);
 }
 #define atomic64_cmpxchg atomic64_cmpxchg
@@ -1470,7 +1483,7 @@ atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 static inline s64
 atomic64_cmpxchg_acquire(atomic64_t *v, s64 old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_cmpxchg_acquire(v, old, new);
 }
 #define atomic64_cmpxchg_acquire atomic64_cmpxchg_acquire
@@ -1480,7 +1493,7 @@ atomic64_cmpxchg_acquire(atomic64_t *v, s64 old, s64 new)
 static inline s64
 atomic64_cmpxchg_release(atomic64_t *v, s64 old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_cmpxchg_release(v, old, new);
 }
 #define atomic64_cmpxchg_release atomic64_cmpxchg_release
@@ -1490,7 +1503,7 @@ atomic64_cmpxchg_release(atomic64_t *v, s64 old, s64 new)
 static inline s64
 atomic64_cmpxchg_relaxed(atomic64_t *v, s64 old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_cmpxchg_relaxed(v, old, new);
 }
 #define atomic64_cmpxchg_relaxed atomic64_cmpxchg_relaxed
@@ -1500,8 +1513,8 @@ atomic64_cmpxchg_relaxed(atomic64_t *v, s64 old, s64 new)
 static inline bool
 atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg(v, old, new);
 }
 #define atomic64_try_cmpxchg atomic64_try_cmpxchg
@@ -1511,8 +1524,8 @@ atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 static inline bool
 atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg_acquire(v, old, new);
 }
 #define atomic64_try_cmpxchg_acquire atomic64_try_cmpxchg_acquire
@@ -1522,8 +1535,8 @@ atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
 static inline bool
 atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg_release(v, old, new);
 }
 #define atomic64_try_cmpxchg_release atomic64_try_cmpxchg_release
@@ -1533,8 +1546,8 @@ atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
 static inline bool
 atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg_relaxed(v, old, new);
 }
 #define atomic64_try_cmpxchg_relaxed atomic64_try_cmpxchg_relaxed
@@ -1544,7 +1557,7 @@ atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
 static inline bool
 atomic64_sub_and_test(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_sub_and_test(i, v);
 }
 #define atomic64_sub_and_test atomic64_sub_and_test
@@ -1554,7 +1567,7 @@ atomic64_sub_and_test(s64 i, atomic64_t *v)
 static inline bool
 atomic64_dec_and_test(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_and_test(v);
 }
 #define atomic64_dec_and_test atomic64_dec_and_test
@@ -1564,7 +1577,7 @@ atomic64_dec_and_test(atomic64_t *v)
 static inline bool
 atomic64_inc_and_test(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_and_test(v);
 }
 #define atomic64_inc_and_test atomic64_inc_and_test
@@ -1574,7 +1587,7 @@ atomic64_inc_and_test(atomic64_t *v)
 static inline bool
 atomic64_add_negative(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_negative(i, v);
 }
 #define atomic64_add_negative atomic64_add_negative
@@ -1584,7 +1597,7 @@ atomic64_add_negative(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add_unless(v, a, u);
 }
 #define atomic64_fetch_add_unless atomic64_fetch_add_unless
@@ -1594,7 +1607,7 @@ atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
 static inline bool
 atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_unless(v, a, u);
 }
 #define atomic64_add_unless atomic64_add_unless
@@ -1604,7 +1617,7 @@ atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
 static inline bool
 atomic64_inc_not_zero(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_not_zero(v);
 }
 #define atomic64_inc_not_zero atomic64_inc_not_zero
@@ -1614,7 +1627,7 @@ atomic64_inc_not_zero(atomic64_t *v)
 static inline bool
 atomic64_inc_unless_negative(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_unless_negative(v);
 }
 #define atomic64_inc_unless_negative atomic64_inc_unless_negative
@@ -1624,7 +1637,7 @@ atomic64_inc_unless_negative(atomic64_t *v)
 static inline bool
 atomic64_dec_unless_positive(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_unless_positive(v);
 }
 #define atomic64_dec_unless_positive atomic64_dec_unless_positive
@@ -1634,7 +1647,7 @@ atomic64_dec_unless_positive(atomic64_t *v)
 static inline s64
 atomic64_dec_if_positive(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_if_positive(v);
 }
 #define atomic64_dec_if_positive atomic64_dec_if_positive
@@ -1644,7 +1657,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define xchg(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_xchg(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1653,7 +1666,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define xchg_acquire(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_xchg_acquire(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1662,7 +1675,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define xchg_release(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_xchg_release(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1671,7 +1684,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define xchg_relaxed(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_xchg_relaxed(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1680,7 +1693,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1689,7 +1702,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg_acquire(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg_acquire(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1698,7 +1711,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg_release(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg_release(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1707,7 +1720,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg_relaxed(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1716,7 +1729,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg64(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg64(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1725,7 +1738,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg64_acquire(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg64_acquire(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1734,7 +1747,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg64_release(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg64_release(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1743,7 +1756,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg64_relaxed(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg64_relaxed(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1751,28 +1764,28 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg_local(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg_local(__ai_ptr, __VA_ARGS__);				\
 })
 
 #define cmpxchg64_local(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg64_local(__ai_ptr, __VA_ARGS__);				\
 })
 
 #define sync_cmpxchg(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__);				\
 })
 
 #define cmpxchg_double(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, 2 * sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, 2 * sizeof(*__ai_ptr));		\
 	arch_cmpxchg_double(__ai_ptr, __VA_ARGS__);				\
 })
 
@@ -1780,9 +1793,9 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg_double_local(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, 2 * sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, 2 * sizeof(*__ai_ptr));		\
 	arch_cmpxchg_double_local(__ai_ptr, __VA_ARGS__);				\
 })
 
 #endif /* _ASM_GENERIC_ATOMIC_INSTRUMENTED_H */
-// b29b625d5de9280f680e42c7be859b55b15e5f6a
+// beea41c2a0f2c69e4958ed71bf26f59740fa4b12
diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
index e09812372b17..8b8b2a6f8d68 100755
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -20,7 +20,7 @@ gen_param_check()
 	# We don't write to constant parameters
 	[ ${type#c} != ${type} ] && rw="read"
 
-	printf "\tkasan_check_${rw}(${name}, sizeof(*${name}));\n"
+	printf "\t__atomic_check_${rw}(${name}, sizeof(*${name}));\n"
 }
 
 #gen_param_check(arg...)
@@ -107,7 +107,7 @@ cat <<EOF
 #define ${xchg}(ptr, ...)						\\
 ({									\\
 	typeof(ptr) __ai_ptr = (ptr);					\\
-	kasan_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr));		\\
+	__atomic_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr));		\\
 	arch_${xchg}(__ai_ptr, __VA_ARGS__);				\\
 })
 EOF
@@ -148,6 +148,19 @@ cat << EOF
 
 #include <linux/build_bug.h>
 #include <linux/kasan-checks.h>
+#include <linux/kcsan-checks.h>
+
+static inline void __atomic_check_read(const volatile void *v, size_t size)
+{
+	kasan_check_read(v, size);
+	kcsan_check_atomic_read(v, size);
+}
+
+static inline void __atomic_check_write(const volatile void *v, size_t size)
+{
+	kasan_check_write(v, size);
+	kcsan_check_atomic_write(v, size);
+}
 
 EOF
 
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 7/8] locking/atomics, kcsan: Add KCSAN instrumentation
@ 2019-10-17 14:13   ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:13 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

This adds KCSAN instrumentation to atomic-instrumented.h.

Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* Use kcsan_check{,_atomic}_{read,write} instead of
  kcsan_check_{access,atomic}.
* Introduce __atomic_check_{read,write} [Suggested by Mark Rutland].
---
 include/asm-generic/atomic-instrumented.h | 393 +++++++++++-----------
 scripts/atomic/gen-atomic-instrumented.sh |  17 +-
 2 files changed, 218 insertions(+), 192 deletions(-)

diff --git a/include/asm-generic/atomic-instrumented.h b/include/asm-generic/atomic-instrumented.h
index e8730c6b9fe2..3dc0f38544f6 100644
--- a/include/asm-generic/atomic-instrumented.h
+++ b/include/asm-generic/atomic-instrumented.h
@@ -19,11 +19,24 @@
 
 #include <linux/build_bug.h>
 #include <linux/kasan-checks.h>
+#include <linux/kcsan-checks.h>
+
+static inline void __atomic_check_read(const volatile void *v, size_t size)
+{
+	kasan_check_read(v, size);
+	kcsan_check_atomic_read(v, size);
+}
+
+static inline void __atomic_check_write(const volatile void *v, size_t size)
+{
+	kasan_check_write(v, size);
+	kcsan_check_atomic_write(v, size);
+}
 
 static inline int
 atomic_read(const atomic_t *v)
 {
-	kasan_check_read(v, sizeof(*v));
+	__atomic_check_read(v, sizeof(*v));
 	return arch_atomic_read(v);
 }
 #define atomic_read atomic_read
@@ -32,7 +45,7 @@ atomic_read(const atomic_t *v)
 static inline int
 atomic_read_acquire(const atomic_t *v)
 {
-	kasan_check_read(v, sizeof(*v));
+	__atomic_check_read(v, sizeof(*v));
 	return arch_atomic_read_acquire(v);
 }
 #define atomic_read_acquire atomic_read_acquire
@@ -41,7 +54,7 @@ atomic_read_acquire(const atomic_t *v)
 static inline void
 atomic_set(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_set(v, i);
 }
 #define atomic_set atomic_set
@@ -50,7 +63,7 @@ atomic_set(atomic_t *v, int i)
 static inline void
 atomic_set_release(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_set_release(v, i);
 }
 #define atomic_set_release atomic_set_release
@@ -59,7 +72,7 @@ atomic_set_release(atomic_t *v, int i)
 static inline void
 atomic_add(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_add(i, v);
 }
 #define atomic_add atomic_add
@@ -68,7 +81,7 @@ atomic_add(int i, atomic_t *v)
 static inline int
 atomic_add_return(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_return(i, v);
 }
 #define atomic_add_return atomic_add_return
@@ -78,7 +91,7 @@ atomic_add_return(int i, atomic_t *v)
 static inline int
 atomic_add_return_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_return_acquire(i, v);
 }
 #define atomic_add_return_acquire atomic_add_return_acquire
@@ -88,7 +101,7 @@ atomic_add_return_acquire(int i, atomic_t *v)
 static inline int
 atomic_add_return_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_return_release(i, v);
 }
 #define atomic_add_return_release atomic_add_return_release
@@ -98,7 +111,7 @@ atomic_add_return_release(int i, atomic_t *v)
 static inline int
 atomic_add_return_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_return_relaxed(i, v);
 }
 #define atomic_add_return_relaxed atomic_add_return_relaxed
@@ -108,7 +121,7 @@ atomic_add_return_relaxed(int i, atomic_t *v)
 static inline int
 atomic_fetch_add(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_add(i, v);
 }
 #define atomic_fetch_add atomic_fetch_add
@@ -118,7 +131,7 @@ atomic_fetch_add(int i, atomic_t *v)
 static inline int
 atomic_fetch_add_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_add_acquire(i, v);
 }
 #define atomic_fetch_add_acquire atomic_fetch_add_acquire
@@ -128,7 +141,7 @@ atomic_fetch_add_acquire(int i, atomic_t *v)
 static inline int
 atomic_fetch_add_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_add_release(i, v);
 }
 #define atomic_fetch_add_release atomic_fetch_add_release
@@ -138,7 +151,7 @@ atomic_fetch_add_release(int i, atomic_t *v)
 static inline int
 atomic_fetch_add_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_add_relaxed(i, v);
 }
 #define atomic_fetch_add_relaxed atomic_fetch_add_relaxed
@@ -147,7 +160,7 @@ atomic_fetch_add_relaxed(int i, atomic_t *v)
 static inline void
 atomic_sub(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_sub(i, v);
 }
 #define atomic_sub atomic_sub
@@ -156,7 +169,7 @@ atomic_sub(int i, atomic_t *v)
 static inline int
 atomic_sub_return(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_sub_return(i, v);
 }
 #define atomic_sub_return atomic_sub_return
@@ -166,7 +179,7 @@ atomic_sub_return(int i, atomic_t *v)
 static inline int
 atomic_sub_return_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_sub_return_acquire(i, v);
 }
 #define atomic_sub_return_acquire atomic_sub_return_acquire
@@ -176,7 +189,7 @@ atomic_sub_return_acquire(int i, atomic_t *v)
 static inline int
 atomic_sub_return_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_sub_return_release(i, v);
 }
 #define atomic_sub_return_release atomic_sub_return_release
@@ -186,7 +199,7 @@ atomic_sub_return_release(int i, atomic_t *v)
 static inline int
 atomic_sub_return_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_sub_return_relaxed(i, v);
 }
 #define atomic_sub_return_relaxed atomic_sub_return_relaxed
@@ -196,7 +209,7 @@ atomic_sub_return_relaxed(int i, atomic_t *v)
 static inline int
 atomic_fetch_sub(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_sub(i, v);
 }
 #define atomic_fetch_sub atomic_fetch_sub
@@ -206,7 +219,7 @@ atomic_fetch_sub(int i, atomic_t *v)
 static inline int
 atomic_fetch_sub_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_sub_acquire(i, v);
 }
 #define atomic_fetch_sub_acquire atomic_fetch_sub_acquire
@@ -216,7 +229,7 @@ atomic_fetch_sub_acquire(int i, atomic_t *v)
 static inline int
 atomic_fetch_sub_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_sub_release(i, v);
 }
 #define atomic_fetch_sub_release atomic_fetch_sub_release
@@ -226,7 +239,7 @@ atomic_fetch_sub_release(int i, atomic_t *v)
 static inline int
 atomic_fetch_sub_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_sub_relaxed(i, v);
 }
 #define atomic_fetch_sub_relaxed atomic_fetch_sub_relaxed
@@ -236,7 +249,7 @@ atomic_fetch_sub_relaxed(int i, atomic_t *v)
 static inline void
 atomic_inc(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_inc(v);
 }
 #define atomic_inc atomic_inc
@@ -246,7 +259,7 @@ atomic_inc(atomic_t *v)
 static inline int
 atomic_inc_return(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_return(v);
 }
 #define atomic_inc_return atomic_inc_return
@@ -256,7 +269,7 @@ atomic_inc_return(atomic_t *v)
 static inline int
 atomic_inc_return_acquire(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_return_acquire(v);
 }
 #define atomic_inc_return_acquire atomic_inc_return_acquire
@@ -266,7 +279,7 @@ atomic_inc_return_acquire(atomic_t *v)
 static inline int
 atomic_inc_return_release(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_return_release(v);
 }
 #define atomic_inc_return_release atomic_inc_return_release
@@ -276,7 +289,7 @@ atomic_inc_return_release(atomic_t *v)
 static inline int
 atomic_inc_return_relaxed(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_return_relaxed(v);
 }
 #define atomic_inc_return_relaxed atomic_inc_return_relaxed
@@ -286,7 +299,7 @@ atomic_inc_return_relaxed(atomic_t *v)
 static inline int
 atomic_fetch_inc(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_inc(v);
 }
 #define atomic_fetch_inc atomic_fetch_inc
@@ -296,7 +309,7 @@ atomic_fetch_inc(atomic_t *v)
 static inline int
 atomic_fetch_inc_acquire(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_inc_acquire(v);
 }
 #define atomic_fetch_inc_acquire atomic_fetch_inc_acquire
@@ -306,7 +319,7 @@ atomic_fetch_inc_acquire(atomic_t *v)
 static inline int
 atomic_fetch_inc_release(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_inc_release(v);
 }
 #define atomic_fetch_inc_release atomic_fetch_inc_release
@@ -316,7 +329,7 @@ atomic_fetch_inc_release(atomic_t *v)
 static inline int
 atomic_fetch_inc_relaxed(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_inc_relaxed(v);
 }
 #define atomic_fetch_inc_relaxed atomic_fetch_inc_relaxed
@@ -326,7 +339,7 @@ atomic_fetch_inc_relaxed(atomic_t *v)
 static inline void
 atomic_dec(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_dec(v);
 }
 #define atomic_dec atomic_dec
@@ -336,7 +349,7 @@ atomic_dec(atomic_t *v)
 static inline int
 atomic_dec_return(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_return(v);
 }
 #define atomic_dec_return atomic_dec_return
@@ -346,7 +359,7 @@ atomic_dec_return(atomic_t *v)
 static inline int
 atomic_dec_return_acquire(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_return_acquire(v);
 }
 #define atomic_dec_return_acquire atomic_dec_return_acquire
@@ -356,7 +369,7 @@ atomic_dec_return_acquire(atomic_t *v)
 static inline int
 atomic_dec_return_release(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_return_release(v);
 }
 #define atomic_dec_return_release atomic_dec_return_release
@@ -366,7 +379,7 @@ atomic_dec_return_release(atomic_t *v)
 static inline int
 atomic_dec_return_relaxed(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_return_relaxed(v);
 }
 #define atomic_dec_return_relaxed atomic_dec_return_relaxed
@@ -376,7 +389,7 @@ atomic_dec_return_relaxed(atomic_t *v)
 static inline int
 atomic_fetch_dec(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_dec(v);
 }
 #define atomic_fetch_dec atomic_fetch_dec
@@ -386,7 +399,7 @@ atomic_fetch_dec(atomic_t *v)
 static inline int
 atomic_fetch_dec_acquire(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_dec_acquire(v);
 }
 #define atomic_fetch_dec_acquire atomic_fetch_dec_acquire
@@ -396,7 +409,7 @@ atomic_fetch_dec_acquire(atomic_t *v)
 static inline int
 atomic_fetch_dec_release(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_dec_release(v);
 }
 #define atomic_fetch_dec_release atomic_fetch_dec_release
@@ -406,7 +419,7 @@ atomic_fetch_dec_release(atomic_t *v)
 static inline int
 atomic_fetch_dec_relaxed(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_dec_relaxed(v);
 }
 #define atomic_fetch_dec_relaxed atomic_fetch_dec_relaxed
@@ -415,7 +428,7 @@ atomic_fetch_dec_relaxed(atomic_t *v)
 static inline void
 atomic_and(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_and(i, v);
 }
 #define atomic_and atomic_and
@@ -424,7 +437,7 @@ atomic_and(int i, atomic_t *v)
 static inline int
 atomic_fetch_and(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_and(i, v);
 }
 #define atomic_fetch_and atomic_fetch_and
@@ -434,7 +447,7 @@ atomic_fetch_and(int i, atomic_t *v)
 static inline int
 atomic_fetch_and_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_and_acquire(i, v);
 }
 #define atomic_fetch_and_acquire atomic_fetch_and_acquire
@@ -444,7 +457,7 @@ atomic_fetch_and_acquire(int i, atomic_t *v)
 static inline int
 atomic_fetch_and_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_and_release(i, v);
 }
 #define atomic_fetch_and_release atomic_fetch_and_release
@@ -454,7 +467,7 @@ atomic_fetch_and_release(int i, atomic_t *v)
 static inline int
 atomic_fetch_and_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_and_relaxed(i, v);
 }
 #define atomic_fetch_and_relaxed atomic_fetch_and_relaxed
@@ -464,7 +477,7 @@ atomic_fetch_and_relaxed(int i, atomic_t *v)
 static inline void
 atomic_andnot(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_andnot(i, v);
 }
 #define atomic_andnot atomic_andnot
@@ -474,7 +487,7 @@ atomic_andnot(int i, atomic_t *v)
 static inline int
 atomic_fetch_andnot(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_andnot(i, v);
 }
 #define atomic_fetch_andnot atomic_fetch_andnot
@@ -484,7 +497,7 @@ atomic_fetch_andnot(int i, atomic_t *v)
 static inline int
 atomic_fetch_andnot_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_andnot_acquire(i, v);
 }
 #define atomic_fetch_andnot_acquire atomic_fetch_andnot_acquire
@@ -494,7 +507,7 @@ atomic_fetch_andnot_acquire(int i, atomic_t *v)
 static inline int
 atomic_fetch_andnot_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_andnot_release(i, v);
 }
 #define atomic_fetch_andnot_release atomic_fetch_andnot_release
@@ -504,7 +517,7 @@ atomic_fetch_andnot_release(int i, atomic_t *v)
 static inline int
 atomic_fetch_andnot_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_andnot_relaxed(i, v);
 }
 #define atomic_fetch_andnot_relaxed atomic_fetch_andnot_relaxed
@@ -513,7 +526,7 @@ atomic_fetch_andnot_relaxed(int i, atomic_t *v)
 static inline void
 atomic_or(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_or(i, v);
 }
 #define atomic_or atomic_or
@@ -522,7 +535,7 @@ atomic_or(int i, atomic_t *v)
 static inline int
 atomic_fetch_or(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_or(i, v);
 }
 #define atomic_fetch_or atomic_fetch_or
@@ -532,7 +545,7 @@ atomic_fetch_or(int i, atomic_t *v)
 static inline int
 atomic_fetch_or_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_or_acquire(i, v);
 }
 #define atomic_fetch_or_acquire atomic_fetch_or_acquire
@@ -542,7 +555,7 @@ atomic_fetch_or_acquire(int i, atomic_t *v)
 static inline int
 atomic_fetch_or_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_or_release(i, v);
 }
 #define atomic_fetch_or_release atomic_fetch_or_release
@@ -552,7 +565,7 @@ atomic_fetch_or_release(int i, atomic_t *v)
 static inline int
 atomic_fetch_or_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_or_relaxed(i, v);
 }
 #define atomic_fetch_or_relaxed atomic_fetch_or_relaxed
@@ -561,7 +574,7 @@ atomic_fetch_or_relaxed(int i, atomic_t *v)
 static inline void
 atomic_xor(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_xor(i, v);
 }
 #define atomic_xor atomic_xor
@@ -570,7 +583,7 @@ atomic_xor(int i, atomic_t *v)
 static inline int
 atomic_fetch_xor(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_xor(i, v);
 }
 #define atomic_fetch_xor atomic_fetch_xor
@@ -580,7 +593,7 @@ atomic_fetch_xor(int i, atomic_t *v)
 static inline int
 atomic_fetch_xor_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_xor_acquire(i, v);
 }
 #define atomic_fetch_xor_acquire atomic_fetch_xor_acquire
@@ -590,7 +603,7 @@ atomic_fetch_xor_acquire(int i, atomic_t *v)
 static inline int
 atomic_fetch_xor_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_xor_release(i, v);
 }
 #define atomic_fetch_xor_release atomic_fetch_xor_release
@@ -600,7 +613,7 @@ atomic_fetch_xor_release(int i, atomic_t *v)
 static inline int
 atomic_fetch_xor_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_xor_relaxed(i, v);
 }
 #define atomic_fetch_xor_relaxed atomic_fetch_xor_relaxed
@@ -610,7 +623,7 @@ atomic_fetch_xor_relaxed(int i, atomic_t *v)
 static inline int
 atomic_xchg(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_xchg(v, i);
 }
 #define atomic_xchg atomic_xchg
@@ -620,7 +633,7 @@ atomic_xchg(atomic_t *v, int i)
 static inline int
 atomic_xchg_acquire(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_xchg_acquire(v, i);
 }
 #define atomic_xchg_acquire atomic_xchg_acquire
@@ -630,7 +643,7 @@ atomic_xchg_acquire(atomic_t *v, int i)
 static inline int
 atomic_xchg_release(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_xchg_release(v, i);
 }
 #define atomic_xchg_release atomic_xchg_release
@@ -640,7 +653,7 @@ atomic_xchg_release(atomic_t *v, int i)
 static inline int
 atomic_xchg_relaxed(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_xchg_relaxed(v, i);
 }
 #define atomic_xchg_relaxed atomic_xchg_relaxed
@@ -650,7 +663,7 @@ atomic_xchg_relaxed(atomic_t *v, int i)
 static inline int
 atomic_cmpxchg(atomic_t *v, int old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_cmpxchg(v, old, new);
 }
 #define atomic_cmpxchg atomic_cmpxchg
@@ -660,7 +673,7 @@ atomic_cmpxchg(atomic_t *v, int old, int new)
 static inline int
 atomic_cmpxchg_acquire(atomic_t *v, int old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_cmpxchg_acquire(v, old, new);
 }
 #define atomic_cmpxchg_acquire atomic_cmpxchg_acquire
@@ -670,7 +683,7 @@ atomic_cmpxchg_acquire(atomic_t *v, int old, int new)
 static inline int
 atomic_cmpxchg_release(atomic_t *v, int old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_cmpxchg_release(v, old, new);
 }
 #define atomic_cmpxchg_release atomic_cmpxchg_release
@@ -680,7 +693,7 @@ atomic_cmpxchg_release(atomic_t *v, int old, int new)
 static inline int
 atomic_cmpxchg_relaxed(atomic_t *v, int old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_cmpxchg_relaxed(v, old, new);
 }
 #define atomic_cmpxchg_relaxed atomic_cmpxchg_relaxed
@@ -690,8 +703,8 @@ atomic_cmpxchg_relaxed(atomic_t *v, int old, int new)
 static inline bool
 atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg(v, old, new);
 }
 #define atomic_try_cmpxchg atomic_try_cmpxchg
@@ -701,8 +714,8 @@ atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 static inline bool
 atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg_acquire(v, old, new);
 }
 #define atomic_try_cmpxchg_acquire atomic_try_cmpxchg_acquire
@@ -712,8 +725,8 @@ atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
 static inline bool
 atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg_release(v, old, new);
 }
 #define atomic_try_cmpxchg_release atomic_try_cmpxchg_release
@@ -723,8 +736,8 @@ atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
 static inline bool
 atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg_relaxed(v, old, new);
 }
 #define atomic_try_cmpxchg_relaxed atomic_try_cmpxchg_relaxed
@@ -734,7 +747,7 @@ atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
 static inline bool
 atomic_sub_and_test(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_sub_and_test(i, v);
 }
 #define atomic_sub_and_test atomic_sub_and_test
@@ -744,7 +757,7 @@ atomic_sub_and_test(int i, atomic_t *v)
 static inline bool
 atomic_dec_and_test(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_and_test(v);
 }
 #define atomic_dec_and_test atomic_dec_and_test
@@ -754,7 +767,7 @@ atomic_dec_and_test(atomic_t *v)
 static inline bool
 atomic_inc_and_test(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_and_test(v);
 }
 #define atomic_inc_and_test atomic_inc_and_test
@@ -764,7 +777,7 @@ atomic_inc_and_test(atomic_t *v)
 static inline bool
 atomic_add_negative(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_negative(i, v);
 }
 #define atomic_add_negative atomic_add_negative
@@ -774,7 +787,7 @@ atomic_add_negative(int i, atomic_t *v)
 static inline int
 atomic_fetch_add_unless(atomic_t *v, int a, int u)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_add_unless(v, a, u);
 }
 #define atomic_fetch_add_unless atomic_fetch_add_unless
@@ -784,7 +797,7 @@ atomic_fetch_add_unless(atomic_t *v, int a, int u)
 static inline bool
 atomic_add_unless(atomic_t *v, int a, int u)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_unless(v, a, u);
 }
 #define atomic_add_unless atomic_add_unless
@@ -794,7 +807,7 @@ atomic_add_unless(atomic_t *v, int a, int u)
 static inline bool
 atomic_inc_not_zero(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_not_zero(v);
 }
 #define atomic_inc_not_zero atomic_inc_not_zero
@@ -804,7 +817,7 @@ atomic_inc_not_zero(atomic_t *v)
 static inline bool
 atomic_inc_unless_negative(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_unless_negative(v);
 }
 #define atomic_inc_unless_negative atomic_inc_unless_negative
@@ -814,7 +827,7 @@ atomic_inc_unless_negative(atomic_t *v)
 static inline bool
 atomic_dec_unless_positive(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_unless_positive(v);
 }
 #define atomic_dec_unless_positive atomic_dec_unless_positive
@@ -824,7 +837,7 @@ atomic_dec_unless_positive(atomic_t *v)
 static inline int
 atomic_dec_if_positive(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_if_positive(v);
 }
 #define atomic_dec_if_positive atomic_dec_if_positive
@@ -833,7 +846,7 @@ atomic_dec_if_positive(atomic_t *v)
 static inline s64
 atomic64_read(const atomic64_t *v)
 {
-	kasan_check_read(v, sizeof(*v));
+	__atomic_check_read(v, sizeof(*v));
 	return arch_atomic64_read(v);
 }
 #define atomic64_read atomic64_read
@@ -842,7 +855,7 @@ atomic64_read(const atomic64_t *v)
 static inline s64
 atomic64_read_acquire(const atomic64_t *v)
 {
-	kasan_check_read(v, sizeof(*v));
+	__atomic_check_read(v, sizeof(*v));
 	return arch_atomic64_read_acquire(v);
 }
 #define atomic64_read_acquire atomic64_read_acquire
@@ -851,7 +864,7 @@ atomic64_read_acquire(const atomic64_t *v)
 static inline void
 atomic64_set(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_set(v, i);
 }
 #define atomic64_set atomic64_set
@@ -860,7 +873,7 @@ atomic64_set(atomic64_t *v, s64 i)
 static inline void
 atomic64_set_release(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_set_release(v, i);
 }
 #define atomic64_set_release atomic64_set_release
@@ -869,7 +882,7 @@ atomic64_set_release(atomic64_t *v, s64 i)
 static inline void
 atomic64_add(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_add(i, v);
 }
 #define atomic64_add atomic64_add
@@ -878,7 +891,7 @@ atomic64_add(s64 i, atomic64_t *v)
 static inline s64
 atomic64_add_return(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_return(i, v);
 }
 #define atomic64_add_return atomic64_add_return
@@ -888,7 +901,7 @@ atomic64_add_return(s64 i, atomic64_t *v)
 static inline s64
 atomic64_add_return_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_return_acquire(i, v);
 }
 #define atomic64_add_return_acquire atomic64_add_return_acquire
@@ -898,7 +911,7 @@ atomic64_add_return_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_add_return_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_return_release(i, v);
 }
 #define atomic64_add_return_release atomic64_add_return_release
@@ -908,7 +921,7 @@ atomic64_add_return_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_add_return_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_return_relaxed(i, v);
 }
 #define atomic64_add_return_relaxed atomic64_add_return_relaxed
@@ -918,7 +931,7 @@ atomic64_add_return_relaxed(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_add(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add(i, v);
 }
 #define atomic64_fetch_add atomic64_fetch_add
@@ -928,7 +941,7 @@ atomic64_fetch_add(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_add_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add_acquire(i, v);
 }
 #define atomic64_fetch_add_acquire atomic64_fetch_add_acquire
@@ -938,7 +951,7 @@ atomic64_fetch_add_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_add_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add_release(i, v);
 }
 #define atomic64_fetch_add_release atomic64_fetch_add_release
@@ -948,7 +961,7 @@ atomic64_fetch_add_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_add_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add_relaxed(i, v);
 }
 #define atomic64_fetch_add_relaxed atomic64_fetch_add_relaxed
@@ -957,7 +970,7 @@ atomic64_fetch_add_relaxed(s64 i, atomic64_t *v)
 static inline void
 atomic64_sub(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_sub(i, v);
 }
 #define atomic64_sub atomic64_sub
@@ -966,7 +979,7 @@ atomic64_sub(s64 i, atomic64_t *v)
 static inline s64
 atomic64_sub_return(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_sub_return(i, v);
 }
 #define atomic64_sub_return atomic64_sub_return
@@ -976,7 +989,7 @@ atomic64_sub_return(s64 i, atomic64_t *v)
 static inline s64
 atomic64_sub_return_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_sub_return_acquire(i, v);
 }
 #define atomic64_sub_return_acquire atomic64_sub_return_acquire
@@ -986,7 +999,7 @@ atomic64_sub_return_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_sub_return_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_sub_return_release(i, v);
 }
 #define atomic64_sub_return_release atomic64_sub_return_release
@@ -996,7 +1009,7 @@ atomic64_sub_return_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_sub_return_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_sub_return_relaxed(i, v);
 }
 #define atomic64_sub_return_relaxed atomic64_sub_return_relaxed
@@ -1006,7 +1019,7 @@ atomic64_sub_return_relaxed(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_sub(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_sub(i, v);
 }
 #define atomic64_fetch_sub atomic64_fetch_sub
@@ -1016,7 +1029,7 @@ atomic64_fetch_sub(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_sub_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_sub_acquire(i, v);
 }
 #define atomic64_fetch_sub_acquire atomic64_fetch_sub_acquire
@@ -1026,7 +1039,7 @@ atomic64_fetch_sub_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_sub_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_sub_release(i, v);
 }
 #define atomic64_fetch_sub_release atomic64_fetch_sub_release
@@ -1036,7 +1049,7 @@ atomic64_fetch_sub_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_sub_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_sub_relaxed(i, v);
 }
 #define atomic64_fetch_sub_relaxed atomic64_fetch_sub_relaxed
@@ -1046,7 +1059,7 @@ atomic64_fetch_sub_relaxed(s64 i, atomic64_t *v)
 static inline void
 atomic64_inc(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_inc(v);
 }
 #define atomic64_inc atomic64_inc
@@ -1056,7 +1069,7 @@ atomic64_inc(atomic64_t *v)
 static inline s64
 atomic64_inc_return(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_return(v);
 }
 #define atomic64_inc_return atomic64_inc_return
@@ -1066,7 +1079,7 @@ atomic64_inc_return(atomic64_t *v)
 static inline s64
 atomic64_inc_return_acquire(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_return_acquire(v);
 }
 #define atomic64_inc_return_acquire atomic64_inc_return_acquire
@@ -1076,7 +1089,7 @@ atomic64_inc_return_acquire(atomic64_t *v)
 static inline s64
 atomic64_inc_return_release(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_return_release(v);
 }
 #define atomic64_inc_return_release atomic64_inc_return_release
@@ -1086,7 +1099,7 @@ atomic64_inc_return_release(atomic64_t *v)
 static inline s64
 atomic64_inc_return_relaxed(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_return_relaxed(v);
 }
 #define atomic64_inc_return_relaxed atomic64_inc_return_relaxed
@@ -1096,7 +1109,7 @@ atomic64_inc_return_relaxed(atomic64_t *v)
 static inline s64
 atomic64_fetch_inc(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_inc(v);
 }
 #define atomic64_fetch_inc atomic64_fetch_inc
@@ -1106,7 +1119,7 @@ atomic64_fetch_inc(atomic64_t *v)
 static inline s64
 atomic64_fetch_inc_acquire(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_inc_acquire(v);
 }
 #define atomic64_fetch_inc_acquire atomic64_fetch_inc_acquire
@@ -1116,7 +1129,7 @@ atomic64_fetch_inc_acquire(atomic64_t *v)
 static inline s64
 atomic64_fetch_inc_release(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_inc_release(v);
 }
 #define atomic64_fetch_inc_release atomic64_fetch_inc_release
@@ -1126,7 +1139,7 @@ atomic64_fetch_inc_release(atomic64_t *v)
 static inline s64
 atomic64_fetch_inc_relaxed(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_inc_relaxed(v);
 }
 #define atomic64_fetch_inc_relaxed atomic64_fetch_inc_relaxed
@@ -1136,7 +1149,7 @@ atomic64_fetch_inc_relaxed(atomic64_t *v)
 static inline void
 atomic64_dec(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_dec(v);
 }
 #define atomic64_dec atomic64_dec
@@ -1146,7 +1159,7 @@ atomic64_dec(atomic64_t *v)
 static inline s64
 atomic64_dec_return(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_return(v);
 }
 #define atomic64_dec_return atomic64_dec_return
@@ -1156,7 +1169,7 @@ atomic64_dec_return(atomic64_t *v)
 static inline s64
 atomic64_dec_return_acquire(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_return_acquire(v);
 }
 #define atomic64_dec_return_acquire atomic64_dec_return_acquire
@@ -1166,7 +1179,7 @@ atomic64_dec_return_acquire(atomic64_t *v)
 static inline s64
 atomic64_dec_return_release(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_return_release(v);
 }
 #define atomic64_dec_return_release atomic64_dec_return_release
@@ -1176,7 +1189,7 @@ atomic64_dec_return_release(atomic64_t *v)
 static inline s64
 atomic64_dec_return_relaxed(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_return_relaxed(v);
 }
 #define atomic64_dec_return_relaxed atomic64_dec_return_relaxed
@@ -1186,7 +1199,7 @@ atomic64_dec_return_relaxed(atomic64_t *v)
 static inline s64
 atomic64_fetch_dec(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_dec(v);
 }
 #define atomic64_fetch_dec atomic64_fetch_dec
@@ -1196,7 +1209,7 @@ atomic64_fetch_dec(atomic64_t *v)
 static inline s64
 atomic64_fetch_dec_acquire(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_dec_acquire(v);
 }
 #define atomic64_fetch_dec_acquire atomic64_fetch_dec_acquire
@@ -1206,7 +1219,7 @@ atomic64_fetch_dec_acquire(atomic64_t *v)
 static inline s64
 atomic64_fetch_dec_release(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_dec_release(v);
 }
 #define atomic64_fetch_dec_release atomic64_fetch_dec_release
@@ -1216,7 +1229,7 @@ atomic64_fetch_dec_release(atomic64_t *v)
 static inline s64
 atomic64_fetch_dec_relaxed(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_dec_relaxed(v);
 }
 #define atomic64_fetch_dec_relaxed atomic64_fetch_dec_relaxed
@@ -1225,7 +1238,7 @@ atomic64_fetch_dec_relaxed(atomic64_t *v)
 static inline void
 atomic64_and(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_and(i, v);
 }
 #define atomic64_and atomic64_and
@@ -1234,7 +1247,7 @@ atomic64_and(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_and(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_and(i, v);
 }
 #define atomic64_fetch_and atomic64_fetch_and
@@ -1244,7 +1257,7 @@ atomic64_fetch_and(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_and_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_and_acquire(i, v);
 }
 #define atomic64_fetch_and_acquire atomic64_fetch_and_acquire
@@ -1254,7 +1267,7 @@ atomic64_fetch_and_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_and_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_and_release(i, v);
 }
 #define atomic64_fetch_and_release atomic64_fetch_and_release
@@ -1264,7 +1277,7 @@ atomic64_fetch_and_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_and_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_and_relaxed(i, v);
 }
 #define atomic64_fetch_and_relaxed atomic64_fetch_and_relaxed
@@ -1274,7 +1287,7 @@ atomic64_fetch_and_relaxed(s64 i, atomic64_t *v)
 static inline void
 atomic64_andnot(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_andnot(i, v);
 }
 #define atomic64_andnot atomic64_andnot
@@ -1284,7 +1297,7 @@ atomic64_andnot(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_andnot(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_andnot(i, v);
 }
 #define atomic64_fetch_andnot atomic64_fetch_andnot
@@ -1294,7 +1307,7 @@ atomic64_fetch_andnot(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_andnot_acquire(i, v);
 }
 #define atomic64_fetch_andnot_acquire atomic64_fetch_andnot_acquire
@@ -1304,7 +1317,7 @@ atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_andnot_release(i, v);
 }
 #define atomic64_fetch_andnot_release atomic64_fetch_andnot_release
@@ -1314,7 +1327,7 @@ atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_andnot_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_andnot_relaxed(i, v);
 }
 #define atomic64_fetch_andnot_relaxed atomic64_fetch_andnot_relaxed
@@ -1323,7 +1336,7 @@ atomic64_fetch_andnot_relaxed(s64 i, atomic64_t *v)
 static inline void
 atomic64_or(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_or(i, v);
 }
 #define atomic64_or atomic64_or
@@ -1332,7 +1345,7 @@ atomic64_or(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_or(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_or(i, v);
 }
 #define atomic64_fetch_or atomic64_fetch_or
@@ -1342,7 +1355,7 @@ atomic64_fetch_or(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_or_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_or_acquire(i, v);
 }
 #define atomic64_fetch_or_acquire atomic64_fetch_or_acquire
@@ -1352,7 +1365,7 @@ atomic64_fetch_or_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_or_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_or_release(i, v);
 }
 #define atomic64_fetch_or_release atomic64_fetch_or_release
@@ -1362,7 +1375,7 @@ atomic64_fetch_or_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_or_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_or_relaxed(i, v);
 }
 #define atomic64_fetch_or_relaxed atomic64_fetch_or_relaxed
@@ -1371,7 +1384,7 @@ atomic64_fetch_or_relaxed(s64 i, atomic64_t *v)
 static inline void
 atomic64_xor(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_xor(i, v);
 }
 #define atomic64_xor atomic64_xor
@@ -1380,7 +1393,7 @@ atomic64_xor(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_xor(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_xor(i, v);
 }
 #define atomic64_fetch_xor atomic64_fetch_xor
@@ -1390,7 +1403,7 @@ atomic64_fetch_xor(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_xor_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_xor_acquire(i, v);
 }
 #define atomic64_fetch_xor_acquire atomic64_fetch_xor_acquire
@@ -1400,7 +1413,7 @@ atomic64_fetch_xor_acquire(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_xor_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_xor_release(i, v);
 }
 #define atomic64_fetch_xor_release atomic64_fetch_xor_release
@@ -1410,7 +1423,7 @@ atomic64_fetch_xor_release(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_xor_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_xor_relaxed(i, v);
 }
 #define atomic64_fetch_xor_relaxed atomic64_fetch_xor_relaxed
@@ -1420,7 +1433,7 @@ atomic64_fetch_xor_relaxed(s64 i, atomic64_t *v)
 static inline s64
 atomic64_xchg(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_xchg(v, i);
 }
 #define atomic64_xchg atomic64_xchg
@@ -1430,7 +1443,7 @@ atomic64_xchg(atomic64_t *v, s64 i)
 static inline s64
 atomic64_xchg_acquire(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_xchg_acquire(v, i);
 }
 #define atomic64_xchg_acquire atomic64_xchg_acquire
@@ -1440,7 +1453,7 @@ atomic64_xchg_acquire(atomic64_t *v, s64 i)
 static inline s64
 atomic64_xchg_release(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_xchg_release(v, i);
 }
 #define atomic64_xchg_release atomic64_xchg_release
@@ -1450,7 +1463,7 @@ atomic64_xchg_release(atomic64_t *v, s64 i)
 static inline s64
 atomic64_xchg_relaxed(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_xchg_relaxed(v, i);
 }
 #define atomic64_xchg_relaxed atomic64_xchg_relaxed
@@ -1460,7 +1473,7 @@ atomic64_xchg_relaxed(atomic64_t *v, s64 i)
 static inline s64
 atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_cmpxchg(v, old, new);
 }
 #define atomic64_cmpxchg atomic64_cmpxchg
@@ -1470,7 +1483,7 @@ atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 static inline s64
 atomic64_cmpxchg_acquire(atomic64_t *v, s64 old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_cmpxchg_acquire(v, old, new);
 }
 #define atomic64_cmpxchg_acquire atomic64_cmpxchg_acquire
@@ -1480,7 +1493,7 @@ atomic64_cmpxchg_acquire(atomic64_t *v, s64 old, s64 new)
 static inline s64
 atomic64_cmpxchg_release(atomic64_t *v, s64 old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_cmpxchg_release(v, old, new);
 }
 #define atomic64_cmpxchg_release atomic64_cmpxchg_release
@@ -1490,7 +1503,7 @@ atomic64_cmpxchg_release(atomic64_t *v, s64 old, s64 new)
 static inline s64
 atomic64_cmpxchg_relaxed(atomic64_t *v, s64 old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_cmpxchg_relaxed(v, old, new);
 }
 #define atomic64_cmpxchg_relaxed atomic64_cmpxchg_relaxed
@@ -1500,8 +1513,8 @@ atomic64_cmpxchg_relaxed(atomic64_t *v, s64 old, s64 new)
 static inline bool
 atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg(v, old, new);
 }
 #define atomic64_try_cmpxchg atomic64_try_cmpxchg
@@ -1511,8 +1524,8 @@ atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 static inline bool
 atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg_acquire(v, old, new);
 }
 #define atomic64_try_cmpxchg_acquire atomic64_try_cmpxchg_acquire
@@ -1522,8 +1535,8 @@ atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
 static inline bool
 atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg_release(v, old, new);
 }
 #define atomic64_try_cmpxchg_release atomic64_try_cmpxchg_release
@@ -1533,8 +1546,8 @@ atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
 static inline bool
 atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg_relaxed(v, old, new);
 }
 #define atomic64_try_cmpxchg_relaxed atomic64_try_cmpxchg_relaxed
@@ -1544,7 +1557,7 @@ atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
 static inline bool
 atomic64_sub_and_test(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_sub_and_test(i, v);
 }
 #define atomic64_sub_and_test atomic64_sub_and_test
@@ -1554,7 +1567,7 @@ atomic64_sub_and_test(s64 i, atomic64_t *v)
 static inline bool
 atomic64_dec_and_test(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_and_test(v);
 }
 #define atomic64_dec_and_test atomic64_dec_and_test
@@ -1564,7 +1577,7 @@ atomic64_dec_and_test(atomic64_t *v)
 static inline bool
 atomic64_inc_and_test(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_and_test(v);
 }
 #define atomic64_inc_and_test atomic64_inc_and_test
@@ -1574,7 +1587,7 @@ atomic64_inc_and_test(atomic64_t *v)
 static inline bool
 atomic64_add_negative(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_negative(i, v);
 }
 #define atomic64_add_negative atomic64_add_negative
@@ -1584,7 +1597,7 @@ atomic64_add_negative(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add_unless(v, a, u);
 }
 #define atomic64_fetch_add_unless atomic64_fetch_add_unless
@@ -1594,7 +1607,7 @@ atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
 static inline bool
 atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_unless(v, a, u);
 }
 #define atomic64_add_unless atomic64_add_unless
@@ -1604,7 +1617,7 @@ atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
 static inline bool
 atomic64_inc_not_zero(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_not_zero(v);
 }
 #define atomic64_inc_not_zero atomic64_inc_not_zero
@@ -1614,7 +1627,7 @@ atomic64_inc_not_zero(atomic64_t *v)
 static inline bool
 atomic64_inc_unless_negative(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_unless_negative(v);
 }
 #define atomic64_inc_unless_negative atomic64_inc_unless_negative
@@ -1624,7 +1637,7 @@ atomic64_inc_unless_negative(atomic64_t *v)
 static inline bool
 atomic64_dec_unless_positive(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_unless_positive(v);
 }
 #define atomic64_dec_unless_positive atomic64_dec_unless_positive
@@ -1634,7 +1647,7 @@ atomic64_dec_unless_positive(atomic64_t *v)
 static inline s64
 atomic64_dec_if_positive(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_if_positive(v);
 }
 #define atomic64_dec_if_positive atomic64_dec_if_positive
@@ -1644,7 +1657,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define xchg(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_xchg(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1653,7 +1666,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define xchg_acquire(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_xchg_acquire(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1662,7 +1675,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define xchg_release(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_xchg_release(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1671,7 +1684,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define xchg_relaxed(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_xchg_relaxed(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1680,7 +1693,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1689,7 +1702,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg_acquire(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg_acquire(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1698,7 +1711,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg_release(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg_release(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1707,7 +1720,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg_relaxed(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1716,7 +1729,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg64(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg64(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1725,7 +1738,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg64_acquire(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg64_acquire(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1734,7 +1747,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg64_release(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg64_release(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1743,7 +1756,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg64_relaxed(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg64_relaxed(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1751,28 +1764,28 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg_local(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg_local(__ai_ptr, __VA_ARGS__);				\
 })
 
 #define cmpxchg64_local(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg64_local(__ai_ptr, __VA_ARGS__);				\
 })
 
 #define sync_cmpxchg(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__);				\
 })
 
 #define cmpxchg_double(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, 2 * sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, 2 * sizeof(*__ai_ptr));		\
 	arch_cmpxchg_double(__ai_ptr, __VA_ARGS__);				\
 })
 
@@ -1780,9 +1793,9 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg_double_local(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, 2 * sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, 2 * sizeof(*__ai_ptr));		\
 	arch_cmpxchg_double_local(__ai_ptr, __VA_ARGS__);				\
 })
 
 #endif /* _ASM_GENERIC_ATOMIC_INSTRUMENTED_H */
-// b29b625d5de9280f680e42c7be859b55b15e5f6a
+// beea41c2a0f2c69e4958ed71bf26f59740fa4b12
diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
index e09812372b17..8b8b2a6f8d68 100755
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -20,7 +20,7 @@ gen_param_check()
 	# We don't write to constant parameters
 	[ ${type#c} != ${type} ] && rw="read"
 
-	printf "\tkasan_check_${rw}(${name}, sizeof(*${name}));\n"
+	printf "\t__atomic_check_${rw}(${name}, sizeof(*${name}));\n"
 }
 
 #gen_param_check(arg...)
@@ -107,7 +107,7 @@ cat <<EOF
 #define ${xchg}(ptr, ...)						\\
 ({									\\
 	typeof(ptr) __ai_ptr = (ptr);					\\
-	kasan_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr));		\\
+	__atomic_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr));		\\
 	arch_${xchg}(__ai_ptr, __VA_ARGS__);				\\
 })
 EOF
@@ -148,6 +148,19 @@ cat << EOF
 
 #include <linux/build_bug.h>
 #include <linux/kasan-checks.h>
+#include <linux/kcsan-checks.h>
+
+static inline void __atomic_check_read(const volatile void *v, size_t size)
+{
+	kasan_check_read(v, size);
+	kcsan_check_atomic_read(v, size);
+}
+
+static inline void __atomic_check_write(const volatile void *v, size_t size)
+{
+	kasan_check_write(v, size);
+	kcsan_check_atomic_write(v, size);
+}
 
 EOF
 
-- 
2.23.0.866.gb869b98d4c-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 8/8] x86, kcsan: Enable KCSAN for x86
  2019-10-17 14:12 ` Marco Elver
@ 2019-10-17 14:13   ` Marco Elver
  -1 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:13 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

This patch enables KCSAN for x86, with updates to build rules to not use
KCSAN for several incompatible compilation units.

Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* Document build exceptions where no previous above comment explained
  why we cannot instrument.
---
 arch/x86/Kconfig                      | 1 +
 arch/x86/boot/Makefile                | 2 ++
 arch/x86/boot/compressed/Makefile     | 2 ++
 arch/x86/entry/vdso/Makefile          | 3 +++
 arch/x86/include/asm/bitops.h         | 6 +++++-
 arch/x86/kernel/Makefile              | 7 +++++++
 arch/x86/kernel/cpu/Makefile          | 3 +++
 arch/x86/lib/Makefile                 | 4 ++++
 arch/x86/mm/Makefile                  | 3 +++
 arch/x86/purgatory/Makefile           | 2 ++
 arch/x86/realmode/Makefile            | 3 +++
 arch/x86/realmode/rm/Makefile         | 3 +++
 drivers/firmware/efi/libstub/Makefile | 2 ++
 13 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d6e1faa28c58..81859be4a005 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -226,6 +226,7 @@ config X86
 	select VIRT_TO_BUS
 	select X86_FEATURE_NAMES		if PROC_FS
 	select PROC_PID_ARCH_STATUS		if PROC_FS
+	select HAVE_ARCH_KCSAN if X86_64
 
 config INSTRUCTION_DECODER
 	def_bool y
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index e2839b5c246c..9c7942794164 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -9,7 +9,9 @@
 # Changed by many, many contributors over the years.
 #
 
+# Sanitizer runtimes are unavailable and cannot be linked for early boot code.
 KASAN_SANITIZE			:= n
+KCSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
 # Kernel does not boot with kcov instrumentation here.
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 6b84afdd7538..a1c248b8439f 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -17,7 +17,9 @@
 #	(see scripts/Makefile.lib size_append)
 #	compressed vmlinux.bin.all + u32 size of vmlinux.bin.all
 
+# Sanitizer runtimes are unavailable and cannot be linked for early boot code.
 KASAN_SANITIZE			:= n
+KCSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 0f2154106d01..a23debaad5b9 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -10,8 +10,11 @@ ARCH_REL_TYPE_ABS += R_386_GLOB_DAT|R_386_JMP_SLOT|R_386_RELATIVE
 include $(srctree)/lib/vdso/Makefile
 
 KBUILD_CFLAGS += $(DISABLE_LTO)
+
+# Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
 UBSAN_SANITIZE			:= n
+KCSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 7d1f6a49bfae..ee08917d3d92 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -201,8 +201,12 @@ arch_test_and_change_bit(long nr, volatile unsigned long *addr)
 	return GEN_BINARY_RMWcc(LOCK_PREFIX __ASM_SIZE(btc), *addr, c, "Ir", nr);
 }
 
-static __always_inline bool constant_test_bit(long nr, const volatile unsigned long *addr)
+static __no_kcsan_or_inline bool constant_test_bit(long nr, const volatile unsigned long *addr)
 {
+	/*
+	 * Because this is a plain access, we need to disable KCSAN here to
+	 * avoid double instrumentation via bitops-instrumented.h.
+	 */
 	return ((1UL << (nr & (BITS_PER_LONG-1))) &
 		(addr[nr >> _BITOPS_LONG_SHIFT])) != 0;
 }
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3578ad248bc9..2aa122d94956 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -28,6 +28,13 @@ KASAN_SANITIZE_dumpstack_$(BITS).o			:= n
 KASAN_SANITIZE_stacktrace.o				:= n
 KASAN_SANITIZE_paravirt.o				:= n
 
+# Do not instrument early boot code.
+KCSAN_SANITIZE_head$(BITS).o				:= n
+# Do not instrument debug code to avoid corrupting bug reporting.
+KCSAN_SANITIZE_dumpstack.o				:= n
+KCSAN_SANITIZE_dumpstack_$(BITS).o			:= n
+KCSAN_SANITIZE_stacktrace.o				:= n
+
 OBJECT_FILES_NON_STANDARD_relocate_kernel_$(BITS).o	:= y
 OBJECT_FILES_NON_STANDARD_test_nx.o			:= y
 OBJECT_FILES_NON_STANDARD_paravirt_patch.o		:= y
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index d7a1e5a9331c..1f1b0edc0187 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -13,6 +13,9 @@ endif
 KCOV_INSTRUMENT_common.o := n
 KCOV_INSTRUMENT_perf_event.o := n
 
+# As above, instrumenting secondary CPU boot code causes boot hangs.
+KCSAN_SANITIZE_common.o := n
+
 # Make sure load_percpu_segment has no stackprotector
 nostackp := $(call cc-option, -fno-stack-protector)
 CFLAGS_common.o		:= $(nostackp)
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 5246db42de45..432a07705677 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -6,10 +6,14 @@
 # Produces uninteresting flaky coverage.
 KCOV_INSTRUMENT_delay.o	:= n
 
+# KCSAN uses udelay for introducing watchpoint delay; avoid recursion.
+KCSAN_SANITIZE_delay.o := n
+
 # Early boot use of cmdline; don't instrument it
 ifdef CONFIG_AMD_MEM_ENCRYPT
 KCOV_INSTRUMENT_cmdline.o := n
 KASAN_SANITIZE_cmdline.o  := n
+KCSAN_SANITIZE_cmdline.o  := n
 
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_cmdline.o = -pg
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 84373dc9b341..ee871602f96a 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -7,6 +7,9 @@ KCOV_INSTRUMENT_mem_encrypt_identity.o	:= n
 KASAN_SANITIZE_mem_encrypt.o		:= n
 KASAN_SANITIZE_mem_encrypt_identity.o	:= n
 
+KCSAN_SANITIZE_mem_encrypt.o		:= n
+KCSAN_SANITIZE_mem_encrypt_identity.o	:= n
+
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_mem_encrypt.o		= -pg
 CFLAGS_REMOVE_mem_encrypt_identity.o	= -pg
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index fb4ee5444379..69379bce9574 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -17,7 +17,9 @@ CFLAGS_sha256.o := -D__DISABLE_EXPORTS
 LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined -nostdlib -z nodefaultlib
 targets += purgatory.ro
 
+# Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE	:= n
+KCSAN_SANITIZE	:= n
 KCOV_INSTRUMENT := n
 
 # These are adjustments to the compiler flags used for objects that
diff --git a/arch/x86/realmode/Makefile b/arch/x86/realmode/Makefile
index 682c895753d9..6b1f3a4eeb44 100644
--- a/arch/x86/realmode/Makefile
+++ b/arch/x86/realmode/Makefile
@@ -6,7 +6,10 @@
 # for more details.
 #
 #
+
+# Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
+KCSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
 subdir- := rm
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index f60501a384f9..fdbbb945c216 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -6,7 +6,10 @@
 # for more details.
 #
 #
+
+# Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
+KCSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 0460c7581220..693d0a94b118 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -31,7 +31,9 @@ KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
 				   -D__DISABLE_EXPORTS
 
 GCOV_PROFILE			:= n
+# Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
+KCSAN_SANITIZE			:= n
 UBSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 8/8] x86, kcsan: Enable KCSAN for x86
@ 2019-10-17 14:13   ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-17 14:13 UTC (permalink / raw)
  To: elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

This patch enables KCSAN for x86, with updates to build rules to not use
KCSAN for several incompatible compilation units.

Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* Document build exceptions where no previous above comment explained
  why we cannot instrument.
---
 arch/x86/Kconfig                      | 1 +
 arch/x86/boot/Makefile                | 2 ++
 arch/x86/boot/compressed/Makefile     | 2 ++
 arch/x86/entry/vdso/Makefile          | 3 +++
 arch/x86/include/asm/bitops.h         | 6 +++++-
 arch/x86/kernel/Makefile              | 7 +++++++
 arch/x86/kernel/cpu/Makefile          | 3 +++
 arch/x86/lib/Makefile                 | 4 ++++
 arch/x86/mm/Makefile                  | 3 +++
 arch/x86/purgatory/Makefile           | 2 ++
 arch/x86/realmode/Makefile            | 3 +++
 arch/x86/realmode/rm/Makefile         | 3 +++
 drivers/firmware/efi/libstub/Makefile | 2 ++
 13 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d6e1faa28c58..81859be4a005 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -226,6 +226,7 @@ config X86
 	select VIRT_TO_BUS
 	select X86_FEATURE_NAMES		if PROC_FS
 	select PROC_PID_ARCH_STATUS		if PROC_FS
+	select HAVE_ARCH_KCSAN if X86_64
 
 config INSTRUCTION_DECODER
 	def_bool y
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index e2839b5c246c..9c7942794164 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -9,7 +9,9 @@
 # Changed by many, many contributors over the years.
 #
 
+# Sanitizer runtimes are unavailable and cannot be linked for early boot code.
 KASAN_SANITIZE			:= n
+KCSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
 # Kernel does not boot with kcov instrumentation here.
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 6b84afdd7538..a1c248b8439f 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -17,7 +17,9 @@
 #	(see scripts/Makefile.lib size_append)
 #	compressed vmlinux.bin.all + u32 size of vmlinux.bin.all
 
+# Sanitizer runtimes are unavailable and cannot be linked for early boot code.
 KASAN_SANITIZE			:= n
+KCSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 0f2154106d01..a23debaad5b9 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -10,8 +10,11 @@ ARCH_REL_TYPE_ABS += R_386_GLOB_DAT|R_386_JMP_SLOT|R_386_RELATIVE
 include $(srctree)/lib/vdso/Makefile
 
 KBUILD_CFLAGS += $(DISABLE_LTO)
+
+# Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
 UBSAN_SANITIZE			:= n
+KCSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 7d1f6a49bfae..ee08917d3d92 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -201,8 +201,12 @@ arch_test_and_change_bit(long nr, volatile unsigned long *addr)
 	return GEN_BINARY_RMWcc(LOCK_PREFIX __ASM_SIZE(btc), *addr, c, "Ir", nr);
 }
 
-static __always_inline bool constant_test_bit(long nr, const volatile unsigned long *addr)
+static __no_kcsan_or_inline bool constant_test_bit(long nr, const volatile unsigned long *addr)
 {
+	/*
+	 * Because this is a plain access, we need to disable KCSAN here to
+	 * avoid double instrumentation via bitops-instrumented.h.
+	 */
 	return ((1UL << (nr & (BITS_PER_LONG-1))) &
 		(addr[nr >> _BITOPS_LONG_SHIFT])) != 0;
 }
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3578ad248bc9..2aa122d94956 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -28,6 +28,13 @@ KASAN_SANITIZE_dumpstack_$(BITS).o			:= n
 KASAN_SANITIZE_stacktrace.o				:= n
 KASAN_SANITIZE_paravirt.o				:= n
 
+# Do not instrument early boot code.
+KCSAN_SANITIZE_head$(BITS).o				:= n
+# Do not instrument debug code to avoid corrupting bug reporting.
+KCSAN_SANITIZE_dumpstack.o				:= n
+KCSAN_SANITIZE_dumpstack_$(BITS).o			:= n
+KCSAN_SANITIZE_stacktrace.o				:= n
+
 OBJECT_FILES_NON_STANDARD_relocate_kernel_$(BITS).o	:= y
 OBJECT_FILES_NON_STANDARD_test_nx.o			:= y
 OBJECT_FILES_NON_STANDARD_paravirt_patch.o		:= y
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index d7a1e5a9331c..1f1b0edc0187 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -13,6 +13,9 @@ endif
 KCOV_INSTRUMENT_common.o := n
 KCOV_INSTRUMENT_perf_event.o := n
 
+# As above, instrumenting secondary CPU boot code causes boot hangs.
+KCSAN_SANITIZE_common.o := n
+
 # Make sure load_percpu_segment has no stackprotector
 nostackp := $(call cc-option, -fno-stack-protector)
 CFLAGS_common.o		:= $(nostackp)
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 5246db42de45..432a07705677 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -6,10 +6,14 @@
 # Produces uninteresting flaky coverage.
 KCOV_INSTRUMENT_delay.o	:= n
 
+# KCSAN uses udelay for introducing watchpoint delay; avoid recursion.
+KCSAN_SANITIZE_delay.o := n
+
 # Early boot use of cmdline; don't instrument it
 ifdef CONFIG_AMD_MEM_ENCRYPT
 KCOV_INSTRUMENT_cmdline.o := n
 KASAN_SANITIZE_cmdline.o  := n
+KCSAN_SANITIZE_cmdline.o  := n
 
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_cmdline.o = -pg
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 84373dc9b341..ee871602f96a 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -7,6 +7,9 @@ KCOV_INSTRUMENT_mem_encrypt_identity.o	:= n
 KASAN_SANITIZE_mem_encrypt.o		:= n
 KASAN_SANITIZE_mem_encrypt_identity.o	:= n
 
+KCSAN_SANITIZE_mem_encrypt.o		:= n
+KCSAN_SANITIZE_mem_encrypt_identity.o	:= n
+
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_mem_encrypt.o		= -pg
 CFLAGS_REMOVE_mem_encrypt_identity.o	= -pg
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index fb4ee5444379..69379bce9574 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -17,7 +17,9 @@ CFLAGS_sha256.o := -D__DISABLE_EXPORTS
 LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined -nostdlib -z nodefaultlib
 targets += purgatory.ro
 
+# Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE	:= n
+KCSAN_SANITIZE	:= n
 KCOV_INSTRUMENT := n
 
 # These are adjustments to the compiler flags used for objects that
diff --git a/arch/x86/realmode/Makefile b/arch/x86/realmode/Makefile
index 682c895753d9..6b1f3a4eeb44 100644
--- a/arch/x86/realmode/Makefile
+++ b/arch/x86/realmode/Makefile
@@ -6,7 +6,10 @@
 # for more details.
 #
 #
+
+# Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
+KCSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
 subdir- := rm
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index f60501a384f9..fdbbb945c216 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -6,7 +6,10 @@
 # for more details.
 #
 #
+
+# Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
+KCSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 0460c7581220..693d0a94b118 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -31,7 +31,9 @@ KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
 				   -D__DISABLE_EXPORTS
 
 GCOV_PROFILE			:= n
+# Sanitizer runtimes are unavailable and cannot be linked here.
 KASAN_SANITIZE			:= n
+KCSAN_SANITIZE			:= n
 UBSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
 
-- 
2.23.0.866.gb869b98d4c-goog



^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-17 14:12   ` Marco Elver
  (?)
@ 2019-10-21 13:37     ` Alexander Potapenko
  -1 siblings, 0 replies; 88+ messages in thread
From: Alexander Potapenko @ 2019-10-21 13:37 UTC (permalink / raw)
  To: Marco Elver
  Cc: akiyks, Alan Stern, parri.andrea, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, boqun.feng,
	Borislav Petkov, dja, dlustig, dave.hansen, dhowells,
	Dmitriy Vyukov, H. Peter Anvin, Ingo Molnar, j.alglave, joel,
	Jonathan Corbet, Josh Poimboeuf, luc.maranget, Mark Rutland,
	npiggin, Paul McKenney, Peter Zijlstra, Thomas Gleixner, will,
	kasan-dev, linux-arch, linux-doc, linux-efi,
	Linux Kbuild mailing list, LKML, Linux Memory Management List,
	x86

On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
Nit: I was under the impression "data races" were commonly written
without a hyphen. I may be mistaken.
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
I think the parameter indentations are a bit off here and below (I've
also looked at the Github diff);
have you considered running checkpatch.pl?
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
I think it should be ok to put {} on the same line with the function
prototype here, see e.g. include/linux/kasan.h
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
Won't ctx->atomic_region suffice for both flat and non-flat regions?
(Do we really need the flat ones?)
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
You need to use braces in both branches here:
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#placing-braces-and-spaces
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
Labels in enums should be capitalized:
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#macros-enums-and-rtl
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
FWIW another option is to put all KCSAN-related functions in a
separate code section and check if the function addresses are in the
address range belonging to that section.
This will work even with non-symbolized stacks.
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
IIRC checkpatch.pl requires all SPDX headers to look like this one
(C++-style, not C-style).
Please double check and fix the headers in other files if necessary.

This file might also use some comments, now it's not easy to
understand what it's testing.
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
Isn't it better to decide at runtime, whether we want to ignore atomics or not?

> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
Overall, I think it's better to make most of these configs boot-time flags.
This way one won't need to rebuild the kernel every time they want to
turn some option on or off.

> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
"KernelConcurrencySanitizer" or "Kernel Concurrency Sanitizer", maybe?
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-21 13:37     ` Alexander Potapenko
  0 siblings, 0 replies; 88+ messages in thread
From: Alexander Potapenko @ 2019-10-21 13:37 UTC (permalink / raw)
  To: Marco Elver
  Cc: akiyks, Alan Stern, parri.andrea, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, boqun.feng,
	Borislav Petkov, dja, dlustig, dave.hansen, dhowells,
	Dmitriy Vyukov, H. Peter Anvin, Ingo Molnar, j.alglave, joel,
	Jonathan Corbet, Josh Poimboeuf, luc.maranget, Mark Rutland,
	npiggin, Paul McKenney, Peter Zijlstra, Thomas Gleixner

On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
Nit: I was under the impression "data races" were commonly written
without a hyphen. I may be mistaken.
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
I think the parameter indentations are a bit off here and below (I've
also looked at the Github diff);
have you considered running checkpatch.pl?
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
I think it should be ok to put {} on the same line with the function
prototype here, see e.g. include/linux/kasan.h
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
Won't ctx->atomic_region suffice for both flat and non-flat regions?
(Do we really need the flat ones?)
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
You need to use braces in both branches here:
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#placing-braces-and-spaces
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
Labels in enums should be capitalized:
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#macros-enums-and-rtl
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
FWIW another option is to put all KCSAN-related functions in a
separate code section and check if the function addresses are in the
address range belonging to that section.
This will work even with non-symbolized stacks.
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
IIRC checkpatch.pl requires all SPDX headers to look like this one
(C++-style, not C-style).
Please double check and fix the headers in other files if necessary.

This file might also use some comments, now it's not easy to
understand what it's testing.
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
Isn't it better to decide at runtime, whether we want to ignore atomics or not?

> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
Overall, I think it's better to make most of these configs boot-time flags.
This way one won't need to rebuild the kernel every time they want to
turn some option on or off.

> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
"KernelConcurrencySanitizer" or "Kernel Concurrency Sanitizer", maybe?
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-21 13:37     ` Alexander Potapenko
  0 siblings, 0 replies; 88+ messages in thread
From: Alexander Potapenko @ 2019-10-21 13:37 UTC (permalink / raw)
  To: Marco Elver
  Cc: akiyks, Alan Stern, parri.andrea, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, boqun.feng,
	Borislav Petkov, dja, dlustig, dave.hansen, dhowells,
	Dmitriy Vyukov, H. Peter Anvin, Ingo Molnar, j.alglave, joel,
	Jonathan Corbet, Josh Poimboeuf, luc.maranget, Mark Rutland,
	npiggin, Paul McKenney, Peter Zijlstra, Thomas Gleixner, will,
	kasan-dev, linux-arch, linux-doc, linux-efi,
	Linux Kbuild mailing list, LKML, Linux Memory Management List,
	x86

On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
Nit: I was under the impression "data races" were commonly written
without a hyphen. I may be mistaken.
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
I think the parameter indentations are a bit off here and below (I've
also looked at the Github diff);
have you considered running checkpatch.pl?
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
I think it should be ok to put {} on the same line with the function
prototype here, see e.g. include/linux/kasan.h
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
Won't ctx->atomic_region suffice for both flat and non-flat regions?
(Do we really need the flat ones?)
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
You need to use braces in both branches here:
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#placing-braces-and-spaces
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
Labels in enums should be capitalized:
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#macros-enums-and-rtl
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
FWIW another option is to put all KCSAN-related functions in a
separate code section and check if the function addresses are in the
address range belonging to that section.
This will work even with non-symbolized stacks.
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
IIRC checkpatch.pl requires all SPDX headers to look like this one
(C++-style, not C-style).
Please double check and fix the headers in other files if necessary.

This file might also use some comments, now it's not easy to
understand what it's testing.
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
Isn't it better to decide at runtime, whether we want to ignore atomics or not?

> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
Overall, I think it's better to make most of these configs boot-time flags.
This way one won't need to rebuild the kernel every time they want to
turn some option on or off.

> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
"KernelConcurrencySanitizer" or "Kernel Concurrency Sanitizer", maybe?
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 2/8] objtool, kcsan: Add KCSAN runtime functions to whitelist
  2019-10-17 14:12   ` Marco Elver
  (?)
@ 2019-10-21 15:15     ` Dmitry Vyukov
  -1 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-21 15:15 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> This patch adds KCSAN runtime functions to the objtool whitelist.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
>  tools/objtool/check.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> index 044c9a3cb247..d1acc867b43c 100644
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -466,6 +466,23 @@ static const char *uaccess_safe_builtin[] = {
>         "__asan_report_store4_noabort",
>         "__asan_report_store8_noabort",
>         "__asan_report_store16_noabort",
> +       /* KCSAN */
> +       "__kcsan_check_watchpoint",
> +       "__kcsan_setup_watchpoint",
> +       /* KCSAN/TSAN out-of-line */

There is no TSAN in-line instrumentation.

> +       "__tsan_func_entry",
> +       "__tsan_func_exit",
> +       "__tsan_read_range",

There is also __tsan_write_range(), right? Isn't it safer to add it right away?

> +       "__tsan_read1",
> +       "__tsan_read2",
> +       "__tsan_read4",
> +       "__tsan_read8",
> +       "__tsan_read16",
> +       "__tsan_write1",
> +       "__tsan_write2",
> +       "__tsan_write4",
> +       "__tsan_write8",
> +       "__tsan_write16",
>         /* KCOV */
>         "write_comp_data",
>         "__sanitizer_cov_trace_pc",
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 2/8] objtool, kcsan: Add KCSAN runtime functions to whitelist
@ 2019-10-21 15:15     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-21 15:15 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf

On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> This patch adds KCSAN runtime functions to the objtool whitelist.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
>  tools/objtool/check.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> index 044c9a3cb247..d1acc867b43c 100644
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -466,6 +466,23 @@ static const char *uaccess_safe_builtin[] = {
>         "__asan_report_store4_noabort",
>         "__asan_report_store8_noabort",
>         "__asan_report_store16_noabort",
> +       /* KCSAN */
> +       "__kcsan_check_watchpoint",
> +       "__kcsan_setup_watchpoint",
> +       /* KCSAN/TSAN out-of-line */

There is no TSAN in-line instrumentation.

> +       "__tsan_func_entry",
> +       "__tsan_func_exit",
> +       "__tsan_read_range",

There is also __tsan_write_range(), right? Isn't it safer to add it right away?

> +       "__tsan_read1",
> +       "__tsan_read2",
> +       "__tsan_read4",
> +       "__tsan_read8",
> +       "__tsan_read16",
> +       "__tsan_write1",
> +       "__tsan_write2",
> +       "__tsan_write4",
> +       "__tsan_write8",
> +       "__tsan_write16",
>         /* KCOV */
>         "write_comp_data",
>         "__sanitizer_cov_trace_pc",
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 2/8] objtool, kcsan: Add KCSAN runtime functions to whitelist
@ 2019-10-21 15:15     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-21 15:15 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> This patch adds KCSAN runtime functions to the objtool whitelist.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
>  tools/objtool/check.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> index 044c9a3cb247..d1acc867b43c 100644
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -466,6 +466,23 @@ static const char *uaccess_safe_builtin[] = {
>         "__asan_report_store4_noabort",
>         "__asan_report_store8_noabort",
>         "__asan_report_store16_noabort",
> +       /* KCSAN */
> +       "__kcsan_check_watchpoint",
> +       "__kcsan_setup_watchpoint",
> +       /* KCSAN/TSAN out-of-line */

There is no TSAN in-line instrumentation.

> +       "__tsan_func_entry",
> +       "__tsan_func_exit",
> +       "__tsan_read_range",

There is also __tsan_write_range(), right? Isn't it safer to add it right away?

> +       "__tsan_read1",
> +       "__tsan_read2",
> +       "__tsan_read4",
> +       "__tsan_read8",
> +       "__tsan_read16",
> +       "__tsan_write1",
> +       "__tsan_write2",
> +       "__tsan_write4",
> +       "__tsan_write8",
> +       "__tsan_write16",
>         /* KCOV */
>         "write_comp_data",
>         "__sanitizer_cov_trace_pc",
> --
> 2.23.0.866.gb869b98d4c-goog
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 2/8] objtool, kcsan: Add KCSAN runtime functions to whitelist
  2019-10-21 15:15     ` Dmitry Vyukov
  (?)
@ 2019-10-21 15:43       ` Marco Elver
  -1 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-21 15:43 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

On Mon, 21 Oct 2019 at 17:15, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
> >
> > This patch adds KCSAN runtime functions to the objtool whitelist.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> >  tools/objtool/check.c | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> >
> > diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> > index 044c9a3cb247..d1acc867b43c 100644
> > --- a/tools/objtool/check.c
> > +++ b/tools/objtool/check.c
> > @@ -466,6 +466,23 @@ static const char *uaccess_safe_builtin[] = {
> >         "__asan_report_store4_noabort",
> >         "__asan_report_store8_noabort",
> >         "__asan_report_store16_noabort",
> > +       /* KCSAN */
> > +       "__kcsan_check_watchpoint",
> > +       "__kcsan_setup_watchpoint",
> > +       /* KCSAN/TSAN out-of-line */
>
> There is no TSAN in-line instrumentation.

Done @ v3.

> > +       "__tsan_func_entry",
> > +       "__tsan_func_exit",
> > +       "__tsan_read_range",
>
> There is also __tsan_write_range(), right? Isn't it safer to add it right away?

Added all missing functions for v3.

Many thanks for the comments!


> > +       "__tsan_read1",
> > +       "__tsan_read2",
> > +       "__tsan_read4",
> > +       "__tsan_read8",
> > +       "__tsan_read16",
> > +       "__tsan_write1",
> > +       "__tsan_write2",
> > +       "__tsan_write4",
> > +       "__tsan_write8",
> > +       "__tsan_write16",
> >         /* KCOV */
> >         "write_comp_data",
> >         "__sanitizer_cov_trace_pc",
> > --
> > 2.23.0.866.gb869b98d4c-goog
> >

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 2/8] objtool, kcsan: Add KCSAN runtime functions to whitelist
@ 2019-10-21 15:43       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-21 15:43 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf

On Mon, 21 Oct 2019 at 17:15, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
> >
> > This patch adds KCSAN runtime functions to the objtool whitelist.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> >  tools/objtool/check.c | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> >
> > diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> > index 044c9a3cb247..d1acc867b43c 100644
> > --- a/tools/objtool/check.c
> > +++ b/tools/objtool/check.c
> > @@ -466,6 +466,23 @@ static const char *uaccess_safe_builtin[] = {
> >         "__asan_report_store4_noabort",
> >         "__asan_report_store8_noabort",
> >         "__asan_report_store16_noabort",
> > +       /* KCSAN */
> > +       "__kcsan_check_watchpoint",
> > +       "__kcsan_setup_watchpoint",
> > +       /* KCSAN/TSAN out-of-line */
>
> There is no TSAN in-line instrumentation.

Done @ v3.

> > +       "__tsan_func_entry",
> > +       "__tsan_func_exit",
> > +       "__tsan_read_range",
>
> There is also __tsan_write_range(), right? Isn't it safer to add it right away?

Added all missing functions for v3.

Many thanks for the comments!


> > +       "__tsan_read1",
> > +       "__tsan_read2",
> > +       "__tsan_read4",
> > +       "__tsan_read8",
> > +       "__tsan_read16",
> > +       "__tsan_write1",
> > +       "__tsan_write2",
> > +       "__tsan_write4",
> > +       "__tsan_write8",
> > +       "__tsan_write16",
> >         /* KCOV */
> >         "write_comp_data",
> >         "__sanitizer_cov_trace_pc",
> > --
> > 2.23.0.866.gb869b98d4c-goog
> >

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 2/8] objtool, kcsan: Add KCSAN runtime functions to whitelist
@ 2019-10-21 15:43       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-21 15:43 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

On Mon, 21 Oct 2019 at 17:15, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
> >
> > This patch adds KCSAN runtime functions to the objtool whitelist.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> >  tools/objtool/check.c | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> >
> > diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> > index 044c9a3cb247..d1acc867b43c 100644
> > --- a/tools/objtool/check.c
> > +++ b/tools/objtool/check.c
> > @@ -466,6 +466,23 @@ static const char *uaccess_safe_builtin[] = {
> >         "__asan_report_store4_noabort",
> >         "__asan_report_store8_noabort",
> >         "__asan_report_store16_noabort",
> > +       /* KCSAN */
> > +       "__kcsan_check_watchpoint",
> > +       "__kcsan_setup_watchpoint",
> > +       /* KCSAN/TSAN out-of-line */
>
> There is no TSAN in-line instrumentation.

Done @ v3.

> > +       "__tsan_func_entry",
> > +       "__tsan_func_exit",
> > +       "__tsan_read_range",
>
> There is also __tsan_write_range(), right? Isn't it safer to add it right away?

Added all missing functions for v3.

Many thanks for the comments!


> > +       "__tsan_read1",
> > +       "__tsan_read2",
> > +       "__tsan_read4",
> > +       "__tsan_read8",
> > +       "__tsan_read16",
> > +       "__tsan_write1",
> > +       "__tsan_write2",
> > +       "__tsan_write4",
> > +       "__tsan_write8",
> > +       "__tsan_write16",
> >         /* KCOV */
> >         "write_comp_data",
> >         "__sanitizer_cov_trace_pc",
> > --
> > 2.23.0.866.gb869b98d4c-goog
> >


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-21 13:37     ` Alexander Potapenko
  (?)
  (?)
@ 2019-10-21 15:54       ` Marco Elver
  -1 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-21 15:54 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern, Andrea Parri,
	Andrey Konovalov, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Boqun Feng, Borislav Petkov, Daniel Axtens, Daniel Lustig,
	dave.hansen, David Howells, Dmitriy Vyukov, H. Peter Anvin,
	Ingo Molnar, Jade Alglave, Joel Fernandes, Jonathan Corbet,
	Josh Poimboeuf, Luc Maranget, Mark Rutland, Nicholas Piggin,
	Paul McKenney, Peter Zijlstra, Thomas Gleixner, Will Deacon,
	kasan-dev, linux-arch, open list:DOCUMENTATION, linux-efi,
	Linux Kbuild mailing list, LKML, Linux Memory Management List,
	the arch/x86 maintainers

On Mon, 21 Oct 2019 at 15:37, Alexander Potapenko <glider@google.com> wrote:
>
> On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
> >
> > Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> > kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> > See the included Documentation/dev-tools/kcsan.rst for more details.
> >
> > This patch adds basic infrastructure, but does not yet enable KCSAN for
> > any architecture.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v2:
> > * Elaborate comment about instrumentation calls emitted by compilers.
> > * Replace kcsan_check_access(.., {true, false}) with
> >   kcsan_check_{read,write} for improved readability.
> > * Change bug title of race of unknown origin to just say "data-race in".
> > * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> > * Add comment about safety of find_watchpoint without user_access_save.
> > * Remove unnecessary preempt_disable/enable and elaborate on comment why
> >   we want to disable interrupts and preemptions.
> > * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
> >   contexts [Suggested by Mark Rutland].
> > ---
> >  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
> >  MAINTAINERS                       |  11 +
> >  Makefile                          |   3 +-
> >  include/linux/compiler-clang.h    |   9 +
> >  include/linux/compiler-gcc.h      |   7 +
> >  include/linux/compiler.h          |  35 ++-
> >  include/linux/kcsan-checks.h      | 147 ++++++++++
> >  include/linux/kcsan.h             | 108 ++++++++
> >  include/linux/sched.h             |   4 +
> >  init/init_task.c                  |   8 +
> >  init/main.c                       |   2 +
> >  kernel/Makefile                   |   1 +
> >  kernel/kcsan/Makefile             |  14 +
> >  kernel/kcsan/atomic.c             |  21 ++
> >  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
> >  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
> >  kernel/kcsan/encoding.h           |  94 +++++++
> >  kernel/kcsan/kcsan.c              |  86 ++++++
> >  kernel/kcsan/kcsan.h              | 140 ++++++++++
> >  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
> >  kernel/kcsan/test.c               | 117 ++++++++
> >  lib/Kconfig.debug                 |   2 +
> >  lib/Kconfig.kcsan                 |  88 ++++++
> >  lib/Makefile                      |   3 +
> >  scripts/Makefile.kcsan            |   6 +
> >  scripts/Makefile.lib              |  10 +
> >  26 files changed, 2069 insertions(+), 9 deletions(-)
> >  create mode 100644 Documentation/dev-tools/kcsan.rst
> >  create mode 100644 include/linux/kcsan-checks.h
> >  create mode 100644 include/linux/kcsan.h
> >  create mode 100644 kernel/kcsan/Makefile
> >  create mode 100644 kernel/kcsan/atomic.c
> >  create mode 100644 kernel/kcsan/core.c
> >  create mode 100644 kernel/kcsan/debugfs.c
> >  create mode 100644 kernel/kcsan/encoding.h
> >  create mode 100644 kernel/kcsan/kcsan.c
> >  create mode 100644 kernel/kcsan/kcsan.h
> >  create mode 100644 kernel/kcsan/report.c
> >  create mode 100644 kernel/kcsan/test.c
> >  create mode 100644 lib/Kconfig.kcsan
> >  create mode 100644 scripts/Makefile.kcsan
> >
> > diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> > new file mode 100644
> > index 000000000000..497b09e5cc96
> > --- /dev/null
> > +++ b/Documentation/dev-tools/kcsan.rst
> > @@ -0,0 +1,203 @@
> > +The Kernel Concurrency Sanitizer (KCSAN)
> > +========================================
> > +
> > +Overview
> > +--------
> > +
> > +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> > +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> > +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> > +detector. Key priorities in KCSAN's design are lack of false positives,
> > +scalability, and simplicity. More details can be found in `Implementation
> > +Details`_.
> > +
> > +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> > +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> > +With Clang it requires version 7.0.0 or later.
> > +
> > +Usage
> > +-----
> > +
> > +To enable KCSAN configure kernel with::
> > +
> > +    CONFIG_KCSAN = y
> > +
> > +KCSAN provides several other configuration options to customize behaviour (see
> > +their respective help text for more info).
> > +
> > +debugfs
> > +~~~~~~~
> > +
> > +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> > +
> > +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> > +  ``/sys/kernel/debug/kcsan``.
> > +
> > +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> > +  ``some_func_name`` to the report filter list, which (by default) blacklists
> > +  reporting data-races where either one of the top stackframes are a function
> > +  in the list.
> > +
> > +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> > +  changes the report filtering behaviour. For example, the blacklist feature
> > +  can be used to silence frequently occurring data-races; the whitelist feature
> > +  can help with reproduction and testing of fixes.
> > +
> > +Error reports
> > +~~~~~~~~~~~~~
> > +
> > +A typical data-race report looks like this::
> > +
> > +    ==================================================================
> > +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> > +
> > +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> > +     kernfs_refresh_inode+0x70/0x170
> > +     kernfs_iop_permission+0x4f/0x90
> > +     inode_permission+0x190/0x200
> > +     link_path_walk.part.0+0x503/0x8e0
> > +     path_lookupat.isra.0+0x69/0x4d0
> > +     filename_lookup+0x136/0x280
> > +     user_path_at_empty+0x47/0x60
> > +     vfs_statx+0x9b/0x130
> > +     __do_sys_newlstat+0x50/0xb0
> > +     __x64_sys_newlstat+0x37/0x50
> > +     do_syscall_64+0x85/0x260
> > +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > +
> > +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> > +     generic_permission+0x5b/0x2a0
> > +     kernfs_iop_permission+0x66/0x90
> > +     inode_permission+0x190/0x200
> > +     link_path_walk.part.0+0x503/0x8e0
> > +     path_lookupat.isra.0+0x69/0x4d0
> > +     filename_lookup+0x136/0x280
> > +     user_path_at_empty+0x47/0x60
> > +     do_faccessat+0x11a/0x390
> > +     __x64_sys_access+0x3c/0x50
> > +     do_syscall_64+0x85/0x260
> > +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > +
> > +    Reported by Kernel Concurrency Sanitizer on:
> > +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +The header of the report provides a short summary of the functions involved in
> > +the race. It is followed by the access types and stack traces of the 2 threads
> > +involved in the data-race.
> > +
> > +The other less common type of data-race report looks like this::
> > +
> > +    ==================================================================
> > +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> > +
> > +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> > +     e1000_clean_rx_irq+0x551/0xb10
> > +     e1000_clean+0x533/0xda0
> > +     net_rx_action+0x329/0x900
> > +     __do_softirq+0xdb/0x2db
> > +     irq_exit+0x9b/0xa0
> > +     do_IRQ+0x9c/0xf0
> > +     ret_from_intr+0x0/0x18
> > +     default_idle+0x3f/0x220
> > +     arch_cpu_idle+0x21/0x30
> > +     do_idle+0x1df/0x230
> > +     cpu_startup_entry+0x14/0x20
> > +     rest_init+0xc5/0xcb
> > +     arch_call_rest_init+0x13/0x2b
> > +     start_kernel+0x6db/0x700
> > +
> > +    Reported by Kernel Concurrency Sanitizer on:
> > +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +This report is generated where it was not possible to determine the other
> > +racing thread, but a race was inferred due to the data-value of the watched
> > +memory location having changed. These can occur either due to missing
> > +instrumentation or e.g. DMA accesses.
> > +
> > +Data-Races
> > +----------
> Nit: I was under the impression "data races" were commonly written
> without a hyphen. I may be mistaken.

Thanks. I've updated it everywhere except in bug titles, which should
remain as-is.

> > +
> > +Informally, two operations *conflict* if they access the same memory location,
> > +and at least one of them is a write operation. In an execution, two memory
> > +operations from different threads form a **data-race** if they *conflict*, at
> > +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> > +the "happens-before" order according to the `LKMM
> > +<../../tools/memory-model/Documentation/explanation.txt>`_.
> > +
> > +Relationship with the Linux Kernel Memory Model (LKMM)
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +The LKMM defines the propagation and ordering rules of various memory
> > +operations, which gives developers the ability to reason about concurrent code.
> > +Ultimately this allows to determine the possible executions of concurrent code,
> > +and if that code is free from data-races.
> > +
> > +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> > +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> > +words, KCSAN assumes that as long as a plain access is not observed to race
> > +with another conflicting access, memory operations are correctly ordered.
> > +
> > +This means that KCSAN will not report *potential* data-races due to missing
> > +memory ordering. If, however, missing memory ordering (that is observable with
> > +a particular compiler and architecture) leads to an observable data-race (e.g.
> > +entering a critical section erroneously), KCSAN would report the resulting
> > +data-race.
> > +
> > +Implementation Details
> > +----------------------
> > +
> > +The general approach is inspired by `DataCollider
> > +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> > +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> > +relies on compiler instrumentation. Watchpoints are implemented using an
> > +efficient encoding that stores access type, size, and address in a long; the
> > +benefits of using "soft watchpoints" are portability and greater flexibility in
> > +limiting which accesses trigger a watchpoint.
> > +
> > +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> > +memory operations; for each instrumented plain access:
> > +
> > +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> > +   write, then we encountered a racing access.
> > +
> > +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> > +   stall some delay.
> > +
> > +3. Also check the data value before the delay, and re-check the data value
> > +   after delay; if the values mismatch, we infer a race of unknown origin.
> > +
> > +To detect data-races between plain and atomic memory operations, KCSAN also
> > +annotates atomic accesses, but only to check if a watchpoint exists
> > +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> > +accesses.
> > +
> > +Key Properties
> > +~~~~~~~~~~~~~~
> > +
> > +1. **Memory Overhead:** No shadow memory is required. The current
> > +   implementation uses a small array of longs to encode watchpoint information,
> > +   which is negligible.
> > +
> > +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> > +   efficient watchpoint encoding that does not require acquiring any shared
> > +   locks in the fast-path. For kernel boot with a default config on a system
> > +   where nproc=8 we measure a slow-down of 10-15x.
> > +
> > +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> > +   may result in missed data-races (false negatives), compared to a
> > +   happens-before data-race detector.
> > +
> > +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> > +
> > +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> > +   runtime. With a happens-before data-race detector, any omission leads to
> > +   false positives, which is especially important in the context of the kernel
> > +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> > +   result, maintenance overheads are minimal as the kernel evolves.
> > +
> > +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> > +   setting up watchpoints, racy writes from devices can also be detected.
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 0154674cbad3..71f7fb625490 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
> >  F:     scripts/kconfig/
> >  F:     scripts/Kconfig.include
> >
> > +KCSAN
> > +M:     Marco Elver <elver@google.com>
> > +R:     Dmitry Vyukov <dvyukov@google.com>
> > +L:     kasan-dev@googlegroups.com
> > +S:     Maintained
> > +F:     Documentation/dev-tools/kcsan.rst
> > +F:     include/linux/kcsan*.h
> > +F:     kernel/kcsan/
> > +F:     lib/Kconfig.kcsan
> > +F:     scripts/Makefile.kcsan
> > +
> >  KDUMP
> >  M:     Dave Young <dyoung@redhat.com>
> >  M:     Baoquan He <bhe@redhat.com>
> > diff --git a/Makefile b/Makefile
> > index ffd7a912fc46..ad4729176252 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
> >
> >  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
> >  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> > -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> > +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
> >  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
> >  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
> >  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> > @@ -900,6 +900,7 @@ endif
> >  include scripts/Makefile.kasan
> >  include scripts/Makefile.extrawarn
> >  include scripts/Makefile.ubsan
> > +include scripts/Makefile.kcsan
> >
> >  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
> >  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> > diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> > index 333a6695a918..a213eb55e725 100644
> > --- a/include/linux/compiler-clang.h
> > +++ b/include/linux/compiler-clang.h
> > @@ -24,6 +24,15 @@
> >  #define __no_sanitize_address
> >  #endif
> >
> > +#if __has_feature(thread_sanitizer)
> > +/* emulate gcc's __SANITIZE_THREAD__ flag */
> > +#define __SANITIZE_THREAD__
> > +#define __no_sanitize_thread \
> > +               __attribute__((no_sanitize("thread")))
> > +#else
> > +#define __no_sanitize_thread
> > +#endif
> > +
> >  /*
> >   * Not all versions of clang implement the the type-generic versions
> >   * of the builtin overflow checkers. Fortunately, clang implements
> > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> > index d7ee4c6bad48..de105ca29282 100644
> > --- a/include/linux/compiler-gcc.h
> > +++ b/include/linux/compiler-gcc.h
> > @@ -145,6 +145,13 @@
> >  #define __no_sanitize_address
> >  #endif
> >
> > +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> > +#define __no_sanitize_thread                                                   \
> > +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> > +#else
> > +#define __no_sanitize_thread
> > +#endif
> > +
> >  #if GCC_VERSION >= 50100
> >  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
> >  #endif
> > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > index 5e88e7e33abe..350d80dbee4d 100644
> > --- a/include/linux/compiler.h
> > +++ b/include/linux/compiler.h
> > @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> >  #endif
> >
> >  #include <uapi/linux/types.h>
> > +#include <linux/kcsan-checks.h>
> >
> >  #define __READ_ONCE_SIZE                                               \
> >  ({                                                                     \
> > @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> >         }                                                               \
> >  })
> >
> > -static __always_inline
> > -void __read_once_size(const volatile void *p, void *res, int size)
> > -{
> > -       __READ_ONCE_SIZE;
> > -}
> > -
> >  #ifdef CONFIG_KASAN
> >  /*
> >   * We can't declare function 'inline' because __no_sanitize_address confilcts
> > @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
> >  # define __no_kasan_or_inline __always_inline
> >  #endif
> >
> > -static __no_kasan_or_inline
> > +#ifdef CONFIG_KCSAN
> > +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> > +#else
> > +# define __no_kcsan_or_inline __always_inline
> > +#endif
> > +
> > +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> > +/* Avoid any instrumentation or inline. */
> > +#define __no_sanitize_or_inline                                                \
> > +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> > +#else
> > +#define __no_sanitize_or_inline __always_inline
> > +#endif
> > +
> > +static __no_kcsan_or_inline
> > +void __read_once_size(const volatile void *p, void *res, int size)
> > +{
> > +       kcsan_check_atomic_read((const void *)p, size);
> > +       __READ_ONCE_SIZE;
> > +}
> > +
> > +static __no_sanitize_or_inline
> >  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
> >  {
> >         __READ_ONCE_SIZE;
> >  }
> >
> > -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> > +static __no_kcsan_or_inline
> > +void __write_once_size(volatile void *p, void *res, int size)
> >  {
> > +       kcsan_check_atomic_write((const void *)p, size);
> > +
> >         switch (size) {
> >         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
> >         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> > diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> > new file mode 100644
> > index 000000000000..4203603ae852
> > --- /dev/null
> > +++ b/include/linux/kcsan-checks.h
> > @@ -0,0 +1,147 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _LINUX_KCSAN_CHECKS_H
> > +#define _LINUX_KCSAN_CHECKS_H
> > +
> > +#include <linux/types.h>
> > +
> > +/*
> > + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> > + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> > + * to validate access to an address.   Never use these in header files!
> > + */
> > +#ifdef CONFIG_KCSAN
> > +/**
> > + * __kcsan_check_watchpoint - check if a watchpoint exists
> > + *
> > + * Returns true if no race was detected, and we may then proceed to set up a
> > + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> > + * encountered, and we may not set up a watchpoint after.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + * @is_write is access a write
> > + * @return true if no race was detected, false otherwise.
> > + */
> > +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write);
> I think the parameter indentations are a bit off here and below (I've
> also looked at the Github diff);
> have you considered running checkpatch.pl?

It was formatted with clang-format, it's correct with 8 space tabs.
checkpath.pl is happy.

> > +
> > +/**
> > + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> > + *
> > + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> > + * reports the data-race.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + * @is_write is access a write
> > + */
> > +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write);
> > +#else
> > +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> > +                                           size_t size, bool is_write)
> > +{
> > +       return true;
> > +}
> > +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> > +                                           size_t size, bool is_write)
> > +{
> > +}
> > +#endif
> > +
> > +/*
> > + * kcsan_*: Only available when the particular compilation unit has KCSAN
> > + * instrumentation enabled. May be used in header files.
> > + */
> > +#ifdef __SANITIZE_THREAD__
> > +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> > +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> > +#else
> > +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> > +                                         bool is_write)
> > +{
> > +       return true;
> > +}
> > +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> > +                                         bool is_write)
> > +{
> > +}
> > +#endif
> > +
> > +/**
> > + * __kcsan_check_read - check regular read access for data-races
> > + *
> > + * Full read access that checks watchpoint and sets up a watchpoint if this
> > + * access is sampled. Note that, setting up watchpoints for plain reads is
> > + * required to also detect data-races with atomic accesses.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define __kcsan_check_read(ptr, size)                                          \
> > +       do {                                                                   \
> > +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> > +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> > +       } while (0)
> > +
> > +/**
> > + * __kcsan_check_write - check regular write access for data-races
> > + *
> > + * Full write access that checks watchpoint and sets up a watchpoint if this
> > + * access is sampled.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define __kcsan_check_write(ptr, size)                                         \
> > +       do {                                                                   \
> > +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> > +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> > +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> > +       } while (0)
> > +
> > +/**
> > + * kcsan_check_read - check regular read access for data-races
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define kcsan_check_read(ptr, size)                                            \
> > +       do {                                                                   \
> > +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> > +                       kcsan_setup_watchpoint(ptr, size, false);              \
> > +       } while (0)
> > +
> > +/**
> > + * kcsan_check_write - check regular write access for data-races
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define kcsan_check_write(ptr, size)                                           \
> > +       do {                                                                   \
> > +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> > +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> > +                       kcsan_setup_watchpoint(ptr, size, true);               \
> > +       } while (0)
> > +
> > +/*
> > + * Check for atomic accesses: if atomic access are not ignored, this simply
> > + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> > + */
> > +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> > +#define kcsan_check_atomic_read(...)                                           \
> > +       do {                                                                   \
> > +       } while (0)
> > +#define kcsan_check_atomic_write(...)                                          \
> > +       do {                                                                   \
> > +       } while (0)
> > +#else
> > +#define kcsan_check_atomic_read(ptr, size)                                     \
> > +       kcsan_check_watchpoint(ptr, size, false)
> > +#define kcsan_check_atomic_write(ptr, size)                                    \
> > +       kcsan_check_watchpoint(ptr, size, true)
> > +#endif
> > +
> > +#endif /* _LINUX_KCSAN_CHECKS_H */
> > diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> > new file mode 100644
> > index 000000000000..fd5de2ba3a16
> > --- /dev/null
> > +++ b/include/linux/kcsan.h
> > @@ -0,0 +1,108 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _LINUX_KCSAN_H
> > +#define _LINUX_KCSAN_H
> > +
> > +#include <linux/types.h>
> > +#include <linux/kcsan-checks.h>
> > +
> > +#ifdef CONFIG_KCSAN
> > +
> > +/*
> > + * Context for each thread of execution: for tasks, this is stored in
> > + * task_struct, and interrupts access internal per-CPU storage.
> > + */
> > +struct kcsan_ctx {
> > +       int disable; /* disable counter */
> > +       int atomic_next; /* number of following atomic ops */
> > +
> > +       /*
> > +        * We use separate variables to store if we are in a nestable or flat
> > +        * atomic region. This helps make sure that an atomic region with
> > +        * nesting support is not suddenly aborted when a flat region is
> > +        * contained within. Effectively this allows supporting nesting flat
> > +        * atomic regions within an outer nestable atomic region. Support for
> > +        * this is required as there are cases where a seqlock reader critical
> > +        * section (flat atomic region) is contained within a seqlock writer
> > +        * critical section (nestable atomic region), and the "mismatching
> > +        * kcsan_end_atomic()" warning would trigger otherwise.
> > +        */
> > +       int atomic_region;
> > +       bool atomic_region_flat;
> > +};
> > +
> > +/**
> > + * kcsan_init - initialize KCSAN runtime
> > + */
> > +void kcsan_init(void);
> > +
> > +/**
> > + * kcsan_disable_current - disable KCSAN for the current context
> > + *
> > + * Supports nesting.
> > + */
> > +void kcsan_disable_current(void);
> > +
> > +/**
> > + * kcsan_enable_current - re-enable KCSAN for the current context
> > + *
> > + * Supports nesting.
> > + */
> > +void kcsan_enable_current(void);
> > +
> > +/**
> > + * kcsan_begin_atomic - use to denote an atomic region
> > + *
> > + * Accesses within the atomic region may appear to race with other accesses but
> > + * should be considered atomic.
> > + *
> > + * @nest true if regions may be nested, or false for flat region
> > + */
> > +void kcsan_begin_atomic(bool nest);
> > +
> > +/**
> > + * kcsan_end_atomic - end atomic region
> > + *
> > + * @nest must match argument to kcsan_begin_atomic().
> > + */
> > +void kcsan_end_atomic(bool nest);
> > +
> > +/**
> > + * kcsan_atomic_next - consider following accesses as atomic
> > + *
> > + * Force treating the next n memory accesses for the current context as atomic
> > + * operations.
> > + *
> > + * @n number of following memory accesses to treat as atomic.
> > + */
> > +void kcsan_atomic_next(int n);
> > +
> > +#else /* CONFIG_KCSAN */
> > +
> > +static inline void kcsan_init(void)
> I think it should be ok to put {} on the same line with the function
> prototype here, see e.g. include/linux/kasan.h

Done @ v3.

> > +{
> > +}
> > +
> > +static inline void kcsan_disable_current(void)
> > +{
> > +}
> > +
> > +static inline void kcsan_enable_current(void)
> > +{
> > +}
> > +
> > +static inline void kcsan_begin_atomic(bool nest)
> > +{
> > +}
> > +
> > +static inline void kcsan_end_atomic(bool nest)
> > +{
> > +}
> > +
> > +static inline void kcsan_atomic_next(int n)
> > +{
> > +}
> > +
> > +#endif /* CONFIG_KCSAN */
> > +
> > +#endif /* _LINUX_KCSAN_H */
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 2c2e56bd8913..9490e417bf4a 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -31,6 +31,7 @@
> >  #include <linux/task_io_accounting.h>
> >  #include <linux/posix-timers.h>
> >  #include <linux/rseq.h>
> > +#include <linux/kcsan.h>
> >
> >  /* task_struct member predeclarations (sorted alphabetically): */
> >  struct audit_context;
> > @@ -1171,6 +1172,9 @@ struct task_struct {
> >  #ifdef CONFIG_KASAN
> >         unsigned int                    kasan_depth;
> >  #endif
> > +#ifdef CONFIG_KCSAN
> > +       struct kcsan_ctx                kcsan_ctx;
> > +#endif
> >
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >         /* Index of current stored address in ret_stack: */
> > diff --git a/init/init_task.c b/init/init_task.c
> > index 9e5cbe5eab7b..e229416c3314 100644
> > --- a/init/init_task.c
> > +++ b/init/init_task.c
> > @@ -161,6 +161,14 @@ struct task_struct init_task
> >  #ifdef CONFIG_KASAN
> >         .kasan_depth    = 1,
> >  #endif
> > +#ifdef CONFIG_KCSAN
> > +       .kcsan_ctx = {
> > +               .disable                = 1,
> > +               .atomic_next            = 0,
> > +               .atomic_region          = 0,
> > +               .atomic_region_flat     = 0,
> > +       },
> > +#endif
> >  #ifdef CONFIG_TRACE_IRQFLAGS
> >         .softirqs_enabled = 1,
> >  #endif
> > diff --git a/init/main.c b/init/main.c
> > index 91f6ebb30ef0..4d814de017ee 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -93,6 +93,7 @@
> >  #include <linux/rodata_test.h>
> >  #include <linux/jump_label.h>
> >  #include <linux/mem_encrypt.h>
> > +#include <linux/kcsan.h>
> >
> >  #include <asm/io.h>
> >  #include <asm/bugs.h>
> > @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
> >         acpi_subsystem_init();
> >         arch_post_acpi_subsys_init();
> >         sfi_init_late();
> > +       kcsan_init();
> >
> >         /* Do the rest non-__init'ed, we're now alive */
> >         arch_call_rest_init();
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index daad787fb795..74ab46e2ebd1 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
> >  obj-$(CONFIG_IRQ_WORK) += irq_work.o
> >  obj-$(CONFIG_CPU_PM) += cpu_pm.o
> >  obj-$(CONFIG_BPF) += bpf/
> > +obj-$(CONFIG_KCSAN) += kcsan/
> >
> >  obj-$(CONFIG_PERF_EVENTS) += events/
> >
> > diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> > new file mode 100644
> > index 000000000000..c25f07062d26
> > --- /dev/null
> > +++ b/kernel/kcsan/Makefile
> > @@ -0,0 +1,14 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +KCSAN_SANITIZE := n
> > +KCOV_INSTRUMENT := n
> > +
> > +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> > +
> > +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> > +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> > +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> > +
> > +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> > +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> > diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> > new file mode 100644
> > index 000000000000..dd44f7d9e491
> > --- /dev/null
> > +++ b/kernel/kcsan/atomic.c
> > @@ -0,0 +1,21 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/jiffies.h>
> > +
> > +#include "kcsan.h"
> > +
> > +/*
> > + * List all volatile globals that have been observed in races, to suppress
> > + * data-race reports between accesses to these variables.
> > + *
> > + * For now, we assume that volatile accesses of globals are as strong as atomic
> > + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> > + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> > + * than cast to volatile. Eventually, we hope to be able to remove this
> > + * function.
> > + */
> > +bool kcsan_is_atomic(const volatile void *ptr)
> > +{
> > +       /* only jiffies for now */
> > +       return ptr == &jiffies;
> > +}
> > diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> > new file mode 100644
> > index 000000000000..bc8d60b129eb
> > --- /dev/null
> > +++ b/kernel/kcsan/core.c
> > @@ -0,0 +1,428 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/atomic.h>
> > +#include <linux/bug.h>
> > +#include <linux/delay.h>
> > +#include <linux/export.h>
> > +#include <linux/init.h>
> > +#include <linux/percpu.h>
> > +#include <linux/preempt.h>
> > +#include <linux/random.h>
> > +#include <linux/sched.h>
> > +#include <linux/uaccess.h>
> > +
> > +#include "kcsan.h"
> > +#include "encoding.h"
> > +
> > +/*
> > + * Helper macros to iterate slots, starting from address slot itself, followed
> > + * by the right and left slots.
> > + */
> > +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> > +#define SLOT_IDX(slot, i)                                                      \
> > +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> > +                 KCSAN_CHECK_ADJACENT)) %                                     \
> > +        KCSAN_NUM_WATCHPOINTS)
> > +
> > +bool kcsan_enabled;
> > +
> > +/* Per-CPU kcsan_ctx for interrupts */
> > +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> > +       .disable = 0,
> > +       .atomic_next = 0,
> > +       .atomic_region = 0,
> > +       .atomic_region_flat = 0,
> > +};
> > +
> > +/*
> > + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> > + * able to safely update and access a watchpoint without introducing locking
> > + * overhead, we encode each watchpoint as a single atomic long. The initial
> > + * zero-initialized state matches INVALID_WATCHPOINT.
> > + */
> > +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> > +
> > +/*
> > + * Instructions skipped counter; see should_watch().
> > + */
> > +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> > +
> > +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> > +                                            bool expect_write,
> > +                                            long *encoded_watchpoint)
> > +{
> > +       const int slot = watchpoint_slot(addr);
> > +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> > +       atomic_long_t *watchpoint;
> > +       unsigned long wp_addr_masked;
> > +       size_t wp_size;
> > +       bool is_write;
> > +       int i;
> > +
> > +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> > +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> > +               *encoded_watchpoint = atomic_long_read(watchpoint);
> > +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> > +                                      &wp_size, &is_write))
> > +                       continue;
> > +
> > +               if (expect_write && !is_write)
> > +                       continue;
> > +
> > +               /* Check if the watchpoint matches the access. */
> > +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> > +                       return watchpoint;
> > +       }
> > +
> > +       return NULL;
> > +}
> > +
> > +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> > +                                              bool is_write)
> > +{
> > +       const int slot = watchpoint_slot(addr);
> > +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> > +       atomic_long_t *watchpoint;
> > +       int i;
> > +
> > +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> > +               long expect_val = INVALID_WATCHPOINT;
> > +
> > +               /* Try to acquire this slot. */
> > +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> > +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> > +                                                   encoded_watchpoint))
> > +                       return watchpoint;
> > +       }
> > +
> > +       return NULL;
> > +}
> > +
> > +/*
> > + * Return true if watchpoint was successfully consumed, false otherwise.
> > + *
> > + * This may return false if:
> > + *
> > + *     1. another thread already consumed the watchpoint;
> > + *     2. the thread that set up the watchpoint already removed it;
> > + *     3. the watchpoint was removed and then re-used.
> > + */
> > +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> > +                                         long encoded_watchpoint)
> > +{
> > +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> > +                                              CONSUMED_WATCHPOINT);
> > +}
> > +
> > +/*
> > + * Return true if watchpoint was not touched, false if consumed.
> > + */
> > +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> > +{
> > +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> > +              CONSUMED_WATCHPOINT;
> > +}
> > +
> > +static inline struct kcsan_ctx *get_ctx(void)
> > +{
> > +       /*
> > +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> > +        * also result in calls that generate warnings in uaccess regions.
> > +        */
> > +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> > +}
> > +
> > +
> > +static inline bool is_atomic(const volatile void *ptr)
> > +{
> > +       struct kcsan_ctx *ctx = get_ctx();
> > +
> > +       if (unlikely(ctx->atomic_next > 0)) {
> > +               --ctx->atomic_next;
> > +               return true;
> > +       }
> > +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> > +               return true;
> Won't ctx->atomic_region suffice for both flat and non-flat regions?
> (Do we really need the flat ones?)

The comment in include/linux/kcsan.h explains:
/*
* We use separate variables to store if we are in a nestable or flat
* atomic region. This helps make sure that an atomic region with
* nesting support is not suddenly aborted when a flat region is
* contained within. Effectively this allows supporting nesting flat
* atomic regions within an outer nestable atomic region. Support for
* this is required as there are cases where a seqlock reader critical
* section (flat atomic region) is contained within a seqlock writer
* critical section (nestable atomic region), and the "mismatching
* kcsan_end_atomic()" warning would trigger otherwise.
*/


> > +       return kcsan_is_atomic(ptr);
> > +}
> > +
> > +static inline bool should_watch(const volatile void *ptr)
> > +{
> > +       /*
> > +        * Never set up watchpoints when memory operations are atomic.
> > +        *
> > +        * We need to check this first, because: 1) atomics should not count
> > +        * towards skipped instructions below, and 2) to actually decrement
> > +        * kcsan_atomic_next for each atomic.
> > +        */
> > +       if (is_atomic(ptr))
> > +               return false;
> > +
> > +       /*
> > +        * We use a per-CPU counter, to avoid excessive contention; there is
> > +        * still enough non-determinism for the precise instructions that end up
> > +        * being watched to be mostly unpredictable. Using a PRNG like
> > +        * prandom_u32() turned out to be too slow.
> > +        */
> > +       return (this_cpu_inc_return(kcsan_skip) %
> > +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> > +}
> > +
> > +static inline bool is_enabled(void)
> > +{
> > +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> > +}
> > +
> > +static inline unsigned int get_delay(void)
> > +{
> > +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> > +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> > +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> > +                      ((prandom_u32() % max_delay) + 1) :
> > +                      max_delay;
> > +}
> > +
> > +/* === Public interface ===================================================== */
> > +
> > +void __init kcsan_init(void)
> > +{
> > +       BUG_ON(!in_task());
> > +
> > +       kcsan_debugfs_init();
> > +       kcsan_enable_current();
> > +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> > +       /*
> > +        * We are in the init task, and no other tasks should be running.
> > +        */
> > +       WRITE_ONCE(kcsan_enabled, true);
> > +#endif
> > +}
> > +
> > +/* === Exported interface =================================================== */
> > +
> > +void kcsan_disable_current(void)
> > +{
> > +       ++get_ctx()->disable;
> > +}
> > +EXPORT_SYMBOL(kcsan_disable_current);
> > +
> > +void kcsan_enable_current(void)
> > +{
> > +       if (get_ctx()->disable-- == 0) {
> > +               kcsan_disable_current(); /* restore to 0 */
> > +               kcsan_disable_current();
> > +               WARN(1, "mismatching %s", __func__);
> > +               kcsan_enable_current();
> > +       }
> > +}
> > +EXPORT_SYMBOL(kcsan_enable_current);
> > +
> > +void kcsan_begin_atomic(bool nest)
> > +{
> > +       if (nest)
> > +               ++get_ctx()->atomic_region;
> > +       else
> > +               get_ctx()->atomic_region_flat = true;
> > +}
> > +EXPORT_SYMBOL(kcsan_begin_atomic);
> > +
> > +void kcsan_end_atomic(bool nest)
> > +{
> > +       if (nest) {
> > +               if (get_ctx()->atomic_region-- == 0) {
> > +                       kcsan_begin_atomic(true); /* restore to 0 */
> > +                       kcsan_disable_current();
> > +                       WARN(1, "mismatching %s", __func__);
> > +                       kcsan_enable_current();
> > +               }
> > +       } else {
> > +               get_ctx()->atomic_region_flat = false;
> > +       }
> > +}
> > +EXPORT_SYMBOL(kcsan_end_atomic);
> > +
> > +void kcsan_atomic_next(int n)
> > +{
> > +       get_ctx()->atomic_next = n;
> > +}
> > +EXPORT_SYMBOL(kcsan_atomic_next);
> > +
> > +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write)
> > +{
> > +       atomic_long_t *watchpoint;
> > +       long encoded_watchpoint;
> > +       unsigned long flags;
> > +       enum kcsan_report_type report_type;
> > +
> > +       if (unlikely(!is_enabled()))
> > +               return false;
> > +
> > +       /*
> > +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> > +        * without user_access_save, as the address that ptr points to is only
> > +        * used to check if a watchpoint exists; ptr is never dereferenced.
> > +        */
> > +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> > +                                    &encoded_watchpoint);
> > +       if (watchpoint == NULL)
> > +               return true;
> > +
> > +       flags = user_access_save();
> > +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> > +               /*
> > +                * The other thread may not print any diagnostics, as it has
> > +                * already removed the watchpoint, or another thread consumed
> > +                * the watchpoint before this thread.
> > +                */
> > +               kcsan_counter_inc(kcsan_counter_report_races);
> > +               report_type = kcsan_report_race_check_race;
> > +       } else {
> > +               report_type = kcsan_report_race_check;
> > +       }
> > +
> > +       /* Encountered a data-race. */
> > +       kcsan_counter_inc(kcsan_counter_data_races);
> > +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> > +
> > +       user_access_restore(flags);
> > +       return false;
> > +}
> > +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> > +
> > +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write)
> > +{
> > +       atomic_long_t *watchpoint;
> > +       union {
> > +               u8 _1;
> > +               u16 _2;
> > +               u32 _4;
> > +               u64 _8;
> > +       } expect_value;
> > +       bool is_expected = true;
> > +       unsigned long ua_flags = user_access_save();
> > +       unsigned long irq_flags;
> > +
> > +       if (!should_watch(ptr))
> > +               goto out;
> > +
> > +       if (!check_encodable((unsigned long)ptr, size)) {
> > +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> > +               goto out;
> > +       }
> > +
> > +       /*
> > +        * Disable interrupts & preemptions to avoid another thread on the same
> > +        * CPU accessing memory locations for the set up watchpoint; this is to
> > +        * avoid reporting races to e.g. CPU-local data.
> > +        *
> > +        * An alternative would be adding the source CPU to the watchpoint
> > +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> > +        * several problems with this:
> > +        *   1. we should avoid stealing more bits from the watchpoint encoding
> > +        *      as it would affect accuracy, as well as increase performance
> > +        *      overhead in the fast-path;
> > +        *   2. if we are preempted, but there *is* a genuine data-race, we
> > +        *      would *not* report it -- since this is the common case (vs.
> > +        *      CPU-local data accesses), it makes more sense (from a data-race
> > +        *      detection PoV) to simply disable preemptions to ensure as many
> > +        *      tasks as possible run on other CPUs.
> > +        */
> > +       local_irq_save(irq_flags);
> > +
> > +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> > +       if (watchpoint == NULL) {
> > +               /*
> > +                * Out of capacity: the size of `watchpoints`, and the frequency
> > +                * with which `should_watch()` returns true should be tweaked so
> > +                * that this case happens very rarely.
> > +                */
> > +               kcsan_counter_inc(kcsan_counter_no_capacity);
> > +               goto out_unlock;
> > +       }
> > +
> > +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> > +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> > +
> > +       /*
> > +        * Read the current value, to later check and infer a race if the data
> > +        * was modified via a non-instrumented access, e.g. from a device.
> > +        */
> > +       switch (size) {
> > +       case 1:
> > +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> > +               break;
> > +       case 2:
> > +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> > +               break;
> > +       case 4:
> > +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> > +               break;
> > +       case 8:
> > +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> > +               break;
> > +       default:
> > +               break; /* ignore; we do not diff the values */
> > +       }
> > +
> > +#ifdef CONFIG_KCSAN_DEBUG
> > +       kcsan_disable_current();
> > +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> > +              is_write ? "write" : "read", size, ptr,
> > +              watchpoint_slot((unsigned long)ptr),
> > +              encode_watchpoint((unsigned long)ptr, size, is_write));
> > +       kcsan_enable_current();
> > +#endif
> > +
> > +       /*
> > +        * Delay this thread, to increase probability of observing a racy
> > +        * conflicting access.
> > +        */
> > +       udelay(get_delay());
> > +
> > +       /*
> > +        * Re-read value, and check if it is as expected; if not, we infer a
> > +        * racy access.
> > +        */
> > +       switch (size) {
> > +       case 1:
> > +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> > +               break;
> > +       case 2:
> > +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> > +               break;
> > +       case 4:
> > +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> > +               break;
> > +       case 8:
> > +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> > +               break;
> > +       default:
> > +               break; /* ignore; we do not diff the values */
> > +       }
> > +
> > +       /* Check if this access raced with another. */
> > +       if (!remove_watchpoint(watchpoint)) {
> > +               /*
> > +                * No need to increment 'race' counter, as the racing thread
> > +                * already did.
> > +                */
> > +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                            kcsan_report_race_setup);
> > +       } else if (!is_expected) {
> > +               /* Inferring a race, since the value should not have changed. */
> > +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> > +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> > +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                            kcsan_report_race_unknown_origin);
> > +#endif
> > +       }
> > +
> > +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> > +out_unlock:
> > +       local_irq_restore(irq_flags);
> > +out:
> > +       user_access_restore(ua_flags);
> > +}
> > +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> > diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> > new file mode 100644
> > index 000000000000..6ddcbd185f3a
> > --- /dev/null
> > +++ b/kernel/kcsan/debugfs.c
> > @@ -0,0 +1,225 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/atomic.h>
> > +#include <linux/bsearch.h>
> > +#include <linux/bug.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/init.h>
> > +#include <linux/kallsyms.h>
> > +#include <linux/mm.h>
> > +#include <linux/seq_file.h>
> > +#include <linux/sort.h>
> > +#include <linux/string.h>
> > +#include <linux/uaccess.h>
> > +
> > +#include "kcsan.h"
> > +
> > +/*
> > + * Statistics counters.
> > + */
> > +static atomic_long_t counters[kcsan_counter_count];
> > +
> > +/*
> > + * Addresses for filtering functions from reporting. This list can be used as a
> > + * whitelist or blacklist.
> > + */
> > +static struct {
> > +       unsigned long *addrs; /* array of addresses */
> > +       size_t size; /* current size */
> > +       int used; /* number of elements used */
> > +       bool sorted; /* if elements are sorted */
> > +       bool whitelist; /* if list is a blacklist or whitelist */
> > +} report_filterlist = {
> > +       .addrs = NULL,
> > +       .size = 8, /* small initial size */
> > +       .used = 0,
> > +       .sorted = false,
> > +       .whitelist = false, /* default is blacklist */
> > +};
> > +static DEFINE_SPINLOCK(report_filterlist_lock);
> > +
> > +static const char *counter_to_name(enum kcsan_counter_id id)
> > +{
> > +       switch (id) {
> > +       case kcsan_counter_used_watchpoints:
> > +               return "used_watchpoints";
> > +       case kcsan_counter_setup_watchpoints:
> > +               return "setup_watchpoints";
> > +       case kcsan_counter_data_races:
> > +               return "data_races";
> > +       case kcsan_counter_no_capacity:
> > +               return "no_capacity";
> > +       case kcsan_counter_report_races:
> > +               return "report_races";
> > +       case kcsan_counter_races_unknown_origin:
> > +               return "races_unknown_origin";
> > +       case kcsan_counter_unencodable_accesses:
> > +               return "unencodable_accesses";
> > +       case kcsan_counter_encoding_false_positives:
> > +               return "encoding_false_positives";
> > +       case kcsan_counter_count:
> > +               BUG();
> > +       }
> > +       return NULL;
> > +}
> > +
> > +void kcsan_counter_inc(enum kcsan_counter_id id)
> > +{
> > +       atomic_long_inc(&counters[id]);
> > +}
> > +
> > +void kcsan_counter_dec(enum kcsan_counter_id id)
> > +{
> > +       atomic_long_dec(&counters[id]);
> > +}
> > +
> > +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> > +{
> > +       const unsigned long a = *(const unsigned long *)rhs;
> > +       const unsigned long b = *(const unsigned long *)lhs;
> > +
> > +       return a < b ? -1 : a == b ? 0 : 1;
> > +}
> > +
> > +bool kcsan_skip_report(unsigned long func_addr)
> > +{
> > +       unsigned long symbolsize, offset;
> > +       unsigned long flags;
> > +       bool ret = false;
> > +
> > +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> > +               return false;
> > +       func_addr -= offset; /* get function start */
> > +
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +       if (report_filterlist.used == 0)
> > +               goto out;
> > +
> > +       /* Sort array if it is unsorted, and then do a binary search. */
> > +       if (!report_filterlist.sorted) {
> > +               sort(report_filterlist.addrs, report_filterlist.used,
> > +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> > +               report_filterlist.sorted = true;
> > +       }
> > +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> > +                       report_filterlist.used, sizeof(unsigned long),
> > +                       cmp_filterlist_addrs);
> > +       if (report_filterlist.whitelist)
> > +               ret = !ret;
> > +
> > +out:
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +       return ret;
> > +}
> > +
> > +static void set_report_filterlist_whitelist(bool whitelist)
> > +{
> > +       unsigned long flags;
> > +
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +       report_filterlist.whitelist = whitelist;
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +}
> > +
> > +static void insert_report_filterlist(const char *func)
> > +{
> > +       unsigned long flags;
> > +       unsigned long addr = kallsyms_lookup_name(func);
> > +
> > +       if (!addr) {
> > +               pr_err("KCSAN: could not find function: '%s'\n", func);
> > +               return;
> > +       }
> > +
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +
> > +       if (report_filterlist.addrs == NULL)
> > +               report_filterlist.addrs = /* initial allocation */
> > +                       kvmalloc_array(report_filterlist.size,
> > +                                      sizeof(unsigned long), GFP_KERNEL);
> You need to use braces in both branches here:
> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#placing-braces-and-spaces

Done @ v3.

> > +       else if (report_filterlist.used == report_filterlist.size) {
> > +               /* resize filterlist */
> > +               unsigned long *new_addrs;
> > +
> > +               report_filterlist.size *= 2;
> > +               new_addrs = kvmalloc_array(report_filterlist.size,
> > +                                          sizeof(unsigned long), GFP_KERNEL);
> > +               memcpy(new_addrs, report_filterlist.addrs,
> > +                      report_filterlist.used * sizeof(unsigned long));
> > +               kvfree(report_filterlist.addrs);
> > +               report_filterlist.addrs = new_addrs;
> > +       }
> > +
> > +       /* Note: deduplicating should be done in userspace. */
> > +       report_filterlist.addrs[report_filterlist.used++] =
> > +               kallsyms_lookup_name(func);
> > +       report_filterlist.sorted = false;
> > +
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +}
> > +
> > +static int show_info(struct seq_file *file, void *v)
> > +{
> > +       int i;
> > +       unsigned long flags;
> > +
> > +       /* show stats */
> > +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> > +       for (i = 0; i < kcsan_counter_count; ++i)
> > +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> > +                          atomic_long_read(&counters[i]));
> > +
> > +       /* show filter functions, and filter type */
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +       seq_printf(file, "\n%s functions: %s\n",
> > +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> > +                  report_filterlist.used == 0 ? "none" : "");
> > +       for (i = 0; i < report_filterlist.used; ++i)
> > +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +
> > +       return 0;
> > +}
> > +
> > +static int debugfs_open(struct inode *inode, struct file *file)
> > +{
> > +       return single_open(file, show_info, NULL);
> > +}
> > +
> > +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> > +                            size_t count, loff_t *off)
> > +{
> > +       char kbuf[KSYM_NAME_LEN];
> > +       char *arg;
> > +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> > +
> > +       if (copy_from_user(kbuf, buf, read_len))
> > +               return -EINVAL;
> > +       kbuf[read_len] = '\0';
> > +       arg = strstrip(kbuf);
> > +
> > +       if (!strncmp(arg, "on", sizeof("on") - 1))
> > +               WRITE_ONCE(kcsan_enabled, true);
> > +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> > +               WRITE_ONCE(kcsan_enabled, false);
> > +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> > +               set_report_filterlist_whitelist(true);
> > +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> > +               set_report_filterlist_whitelist(false);
> > +       else if (arg[0] == '!')
> > +               insert_report_filterlist(&arg[1]);
> > +       else
> > +               return -EINVAL;
> > +
> > +       return count;
> > +}
> > +
> > +static const struct file_operations debugfs_ops = { .read = seq_read,
> > +                                                   .open = debugfs_open,
> > +                                                   .write = debugfs_write,
> > +                                                   .release = single_release };
> > +
> > +void __init kcsan_debugfs_init(void)
> > +{
> > +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> > +}
> > diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> > new file mode 100644
> > index 000000000000..8f9b1ce0e59f
> > --- /dev/null
> > +++ b/kernel/kcsan/encoding.h
> > @@ -0,0 +1,94 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _MM_KCSAN_ENCODING_H
> > +#define _MM_KCSAN_ENCODING_H
> > +
> > +#include <linux/bits.h>
> > +#include <linux/log2.h>
> > +#include <linux/mm.h>
> > +
> > +#include "kcsan.h"
> > +
> > +#define SLOT_RANGE PAGE_SIZE
> > +#define INVALID_WATCHPOINT 0
> > +#define CONSUMED_WATCHPOINT 1
> > +
> > +/*
> > + * The maximum useful size of accesses for which we set up watchpoints is the
> > + * max range of slots we check on an access.
> > + */
> > +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> > +
> > +/*
> > + * Number of bits we use to store size info.
> > + */
> > +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> > +/*
> > + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> > + * however, most 64-bit architectures do not use the full 64-bit address space.
> > + * Also, in order for a false positive to be observable 2 things need to happen:
> > + *
> > + *     1. different addresses but with the same encoded address race;
> > + *     2. and both map onto the same watchpoint slots;
> > + *
> > + * Both these are assumed to be very unlikely. However, in case it still happens
> > + * happens, the report logic will filter out the false positive (see report.c).
> > + */
> > +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> > +
> > +/*
> > + * Masks to set/retrieve the encoded data.
> > + */
> > +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> > +#define WATCHPOINT_SIZE_MASK                                                   \
> > +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> > +#define WATCHPOINT_ADDR_MASK                                                   \
> > +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> > +
> > +static inline bool check_encodable(unsigned long addr, size_t size)
> > +{
> > +       return size <= MAX_ENCODABLE_SIZE;
> > +}
> > +
> > +static inline long encode_watchpoint(unsigned long addr, size_t size,
> > +                                    bool is_write)
> > +{
> > +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> > +                     (size << WATCHPOINT_ADDR_BITS) |
> > +                     (addr & WATCHPOINT_ADDR_MASK));
> > +}
> > +
> > +static inline bool decode_watchpoint(long watchpoint,
> > +                                    unsigned long *addr_masked, size_t *size,
> > +                                    bool *is_write)
> > +{
> > +       if (watchpoint == INVALID_WATCHPOINT ||
> > +           watchpoint == CONSUMED_WATCHPOINT)
> > +               return false;
> > +
> > +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> > +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> > +               WATCHPOINT_ADDR_BITS;
> > +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> > +
> > +       return true;
> > +}
> > +
> > +/*
> > + * Return watchpoint slot for an address.
> > + */
> > +static inline int watchpoint_slot(unsigned long addr)
> > +{
> > +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> > +}
> > +
> > +static inline bool matching_access(unsigned long addr1, size_t size1,
> > +                                  unsigned long addr2, size_t size2)
> > +{
> > +       unsigned long end_range1 = addr1 + size1 - 1;
> > +       unsigned long end_range2 = addr2 + size2 - 1;
> > +
> > +       return addr1 <= end_range2 && addr2 <= end_range1;
> > +}
> > +
> > +#endif /* _MM_KCSAN_ENCODING_H */
> > diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> > new file mode 100644
> > index 000000000000..45cf2fffd8a0
> > --- /dev/null
> > +++ b/kernel/kcsan/kcsan.c
> > @@ -0,0 +1,86 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> > + * see Documentation/dev-tools/kcsan.rst.
> > + */
> > +
> > +#include <linux/export.h>
> > +
> > +#include "kcsan.h"
> > +
> > +/*
> > + * KCSAN uses the same instrumentation that is emitted by supported compilers
> > + * for Thread Sanitizer (TSAN).
> > + *
> > + * When enabled, the compiler emits instrumentation calls (the functions
> > + * prefixed with "__tsan" below) for all loads and stores that it generated;
> > + * inline asm is not instrumented.
> > + */
> > +
> > +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> > +       void __tsan_read##size(void *ptr)                                      \
> > +       {                                                                      \
> > +               __kcsan_check_read(ptr, size);                                 \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> > +       void __tsan_write##size(void *ptr)                                     \
> > +       {                                                                      \
> > +               __kcsan_check_write(ptr, size);                                \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_write##size)
> > +
> > +DEFINE_TSAN_READ_WRITE(1);
> > +DEFINE_TSAN_READ_WRITE(2);
> > +DEFINE_TSAN_READ_WRITE(4);
> > +DEFINE_TSAN_READ_WRITE(8);
> > +DEFINE_TSAN_READ_WRITE(16);
> > +
> > +/*
> > + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> > + * but e.g. recent versions of Clang do.
> > + */
> > +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> > +       void __tsan_unaligned_read##size(void *ptr)                            \
> > +       {                                                                      \
> > +               __kcsan_check_read(ptr, size);                                 \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> > +       void __tsan_unaligned_write##size(void *ptr)                           \
> > +       {                                                                      \
> > +               __kcsan_check_write(ptr, size);                                \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> > +
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> > +
> > +void __tsan_read_range(void *ptr, size_t size)
> > +{
> > +       __kcsan_check_read(ptr, size);
> > +}
> > +EXPORT_SYMBOL(__tsan_read_range);
> > +
> > +void __tsan_write_range(void *ptr, size_t size)
> > +{
> > +       __kcsan_check_write(ptr, size);
> > +}
> > +EXPORT_SYMBOL(__tsan_write_range);
> > +
> > +/*
> > + * The below are not required KCSAN, but can still be emitted by the compiler.
> > + */
> > +void __tsan_func_entry(void *call_pc)
> > +{
> > +}
> > +EXPORT_SYMBOL(__tsan_func_entry);
> > +void __tsan_func_exit(void)
> > +{
> > +}
> > +EXPORT_SYMBOL(__tsan_func_exit);
> > +void __tsan_init(void)
> > +{
> > +}
> > +EXPORT_SYMBOL(__tsan_init);
> > diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> > new file mode 100644
> > index 000000000000..429479b3041d
> > --- /dev/null
> > +++ b/kernel/kcsan/kcsan.h
> > @@ -0,0 +1,140 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _MM_KCSAN_KCSAN_H
> > +#define _MM_KCSAN_KCSAN_H
> > +
> > +#include <linux/kcsan.h>
> > +
> > +/*
> > + * Total number of watchpoints. An address range maps into a specific slot as
> > + * specified in `encoding.h`. Although larger number of watchpoints may not even
> > + * be usable due to limited thread count, a larger value will improve
> > + * performance due to reducing cache-line contention.
> > + */
> > +#define KCSAN_NUM_WATCHPOINTS 64
> > +
> > +/*
> > + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> > + *
> > + *     1. the address slot is already occupied, check if any adjacent slots are
> > + *        free;
> > + *     2. accesses that straddle a slot boundary due to size that exceeds a
> > + *        slot's range may check adjacent slots if any watchpoint matches.
> > + *
> > + * Note that accesses with very large size may still miss a watchpoint; however,
> > + * given this should be rare, this is a reasonable trade-off to make, since this
> > + * will avoid:
> > + *
> > + *     1. excessive contention between watchpoint checks and setup;
> > + *     2. larger number of simultaneous watchpoints without sacrificing
> > + *        performance.
> > + */
> > +#define KCSAN_CHECK_ADJACENT 1
> > +
> > +/*
> > + * Globally enable and disable KCSAN.
> > + */
> > +extern bool kcsan_enabled;
> > +
> > +/*
> > + * Helper that returns true if access to ptr should be considered as an atomic
> > + * access, even though it is not explicitly atomic.
> > + */
> > +bool kcsan_is_atomic(const volatile void *ptr);
> > +
> > +/*
> > + * Initialize debugfs file.
> > + */
> > +void kcsan_debugfs_init(void);
> > +
> > +enum kcsan_counter_id {
> Labels in enums should be capitalized:
> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#macros-enums-and-rtl

Done @ v3.

> > +       /*
> > +        * Number of watchpoints currently in use.
> > +        */
> > +       kcsan_counter_used_watchpoints,
> > +
> > +       /*
> > +        * Total number of watchpoints set up.
> > +        */
> > +       kcsan_counter_setup_watchpoints,
> > +
> > +       /*
> > +        * Total number of data-races.
> > +        */
> > +       kcsan_counter_data_races,
> > +
> > +       /*
> > +        * Number of times no watchpoints were available.
> > +        */
> > +       kcsan_counter_no_capacity,
> > +
> > +       /*
> > +        * A thread checking a watchpoint raced with another checking thread;
> > +        * only one will be reported.
> > +        */
> > +       kcsan_counter_report_races,
> > +
> > +       /*
> > +        * Observed data value change, but writer thread unknown.
> > +        */
> > +       kcsan_counter_races_unknown_origin,
> > +
> > +       /*
> > +        * The access cannot be encoded to a valid watchpoint.
> > +        */
> > +       kcsan_counter_unencodable_accesses,
> > +
> > +       /*
> > +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> > +        * accesses.
> > +        */
> > +       kcsan_counter_encoding_false_positives,
> > +
> > +       kcsan_counter_count, /* number of counters */
> > +};
> > +
> > +/*
> > + * Increment/decrement counter with given id; avoid calling these in fast-path.
> > + */
> > +void kcsan_counter_inc(enum kcsan_counter_id id);
> > +void kcsan_counter_dec(enum kcsan_counter_id id);
> > +
> > +/*
> > + * Returns true if data-races in the function symbol that maps to addr (offsets
> > + * are ignored) should *not* be reported.
> > + */
> > +bool kcsan_skip_report(unsigned long func_addr);
> > +
> > +enum kcsan_report_type {
> > +       /*
> > +        * The thread that set up the watchpoint and briefly stalled was
> > +        * signalled that another thread triggered the watchpoint, and thus a
> > +        * race was encountered.
> > +        */
> > +       kcsan_report_race_setup,
> > +
> > +       /*
> > +        * A thread encountered a watchpoint for the access, therefore a race
> > +        * was encountered.
> > +        */
> > +       kcsan_report_race_check,
> > +
> > +       /*
> > +        * A thread encountered a watchpoint for the access, but the other
> > +        * racing thread can no longer be signaled that a race occurred.
> > +        */
> > +       kcsan_report_race_check_race,
> > +
> > +       /*
> > +        * No other thread was observed to race with the access, but the data
> > +        * value before and after the stall differs.
> > +        */
> > +       kcsan_report_race_unknown_origin,
> > +};
> > +/*
> > + * Print a race report from thread that encountered the race.
> > + */
> > +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> > +                 int cpu_id, enum kcsan_report_type type);
> > +
> > +#endif /* _MM_KCSAN_KCSAN_H */
> > diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> > new file mode 100644
> > index 000000000000..517db539e4e7
> > --- /dev/null
> > +++ b/kernel/kcsan/report.c
> > @@ -0,0 +1,306 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/preempt.h>
> > +#include <linux/printk.h>
> > +#include <linux/sched.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/stacktrace.h>
> > +
> > +#include "kcsan.h"
> > +#include "encoding.h"
> > +
> > +/*
> > + * Max. number of stack entries to show in the report.
> > + */
> > +#define NUM_STACK_ENTRIES 16
> > +
> > +/*
> > + * Other thread info: communicated from other racing thread to thread that set
> > + * up the watchpoint, which then prints the complete report atomically. Only
> > + * need one struct, as all threads should to be serialized regardless to print
> > + * the reports, with reporting being in the slow-path.
> > + */
> > +static struct {
> > +       const volatile void *ptr;
> > +       size_t size;
> > +       bool is_write;
> > +       int task_pid;
> > +       int cpu_id;
> > +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> > +       int num_stack_entries;
> > +} other_info = { .ptr = NULL };
> > +
> > +static DEFINE_SPINLOCK(other_info_lock);
> > +static DEFINE_SPINLOCK(report_lock);
> > +
> > +static bool set_or_lock_other_info(unsigned long *flags,
> > +                                  const volatile void *ptr, size_t size,
> > +                                  bool is_write, int cpu_id,
> > +                                  enum kcsan_report_type type)
> > +{
> > +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> > +               return true;
> > +
> > +       for (;;) {
> > +               spin_lock_irqsave(&other_info_lock, *flags);
> > +
> > +               switch (type) {
> > +               case kcsan_report_race_check:
> > +                       if (other_info.ptr != NULL) {
> > +                               /* still in use, retry */
> > +                               break;
> > +                       }
> > +                       other_info.ptr = ptr;
> > +                       other_info.size = size;
> > +                       other_info.is_write = is_write;
> > +                       other_info.task_pid =
> > +                               in_task() ? task_pid_nr(current) : -1;
> > +                       other_info.cpu_id = cpu_id;
> > +                       other_info.num_stack_entries = stack_trace_save(
> > +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> > +                       /*
> > +                        * other_info may now be consumed by thread we raced
> > +                        * with.
> > +                        */
> > +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> > +                       return false;
> > +
> > +               case kcsan_report_race_setup:
> > +                       if (other_info.ptr == NULL)
> > +                               break; /* no data available yet, retry */
> > +
> > +                       /*
> > +                        * First check if matching based on how watchpoint was
> > +                        * encoded.
> > +                        */
> > +                       if (!matching_access((unsigned long)other_info.ptr &
> > +                                                    WATCHPOINT_ADDR_MASK,
> > +                                            other_info.size,
> > +                                            (unsigned long)ptr &
> > +                                                    WATCHPOINT_ADDR_MASK,
> > +                                            size))
> > +                               break; /* mismatching access, retry */
> > +
> > +                       if (!matching_access((unsigned long)other_info.ptr,
> > +                                            other_info.size,
> > +                                            (unsigned long)ptr, size)) {
> > +                               /*
> > +                                * If the actual accesses to not match, this was
> > +                                * a false positive due to watchpoint encoding.
> > +                                */
> > +                               other_info.ptr = NULL; /* mark for reuse */
> > +                               kcsan_counter_inc(
> > +                                       kcsan_counter_encoding_false_positives);
> > +                               spin_unlock_irqrestore(&other_info_lock,
> > +                                                      *flags);
> > +                               return false;
> > +                       }
> > +
> > +                       /*
> > +                        * Matching access: keep other_info locked, as this
> > +                        * thread uses it to print the full report; unlocked in
> > +                        * end_report.
> > +                        */
> > +                       return true;
> > +
> > +               default:
> > +                       BUG();
> > +               }
> > +
> > +               spin_unlock_irqrestore(&other_info_lock, *flags);
> > +       }
> > +}
> > +
> > +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> > +{
> > +       switch (type) {
> > +       case kcsan_report_race_setup:
> > +               /* irqsaved already via other_info_lock */
> > +               spin_lock(&report_lock);
> > +               break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               spin_lock_irqsave(&report_lock, *flags);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +}
> > +
> > +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> > +{
> > +       switch (type) {
> > +       case kcsan_report_race_setup:
> > +               other_info.ptr = NULL; /* mark for reuse */
> > +               spin_unlock(&report_lock);
> > +               spin_unlock_irqrestore(&other_info_lock, *flags);
> > +               break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               spin_unlock_irqrestore(&report_lock, *flags);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +}
> > +
> > +static const char *get_access_type(bool is_write)
> > +{
> > +       return is_write ? "write" : "read";
> > +}
> > +
> > +/* Return thread description: in task or interrupt. */
> > +static const char *get_thread_desc(int task_id)
> > +{
> > +       if (task_id != -1) {
> > +               static char buf[32]; /* safe: protected by report_lock */
> > +
> > +               snprintf(buf, sizeof(buf), "task %i", task_id);
> > +               return buf;
> > +       }
> > +       return in_nmi() ? "NMI" : "interrupt";
> > +}
> > +
> > +/* Helper to skip KCSAN-related functions in stack-trace. */
> > +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> > +{
> > +       char buf[64];
> > +       int skip = 0;
> > +
> > +       for (; skip < num_entries; ++skip) {
> > +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> > +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> > +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> > +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> > +                       break;
> > +               }
> > +       }
> > +       return skip;
> > +}
> FWIW another option is to put all KCSAN-related functions in a
> separate code section and check if the function addresses are in the
> address range belonging to that section.
> This will work even with non-symbolized stacks.

Thanks for the suggestion. Is it worth it, i.e. will it simplify the
design and code? If it simplifies the design (or makes the fast-path
significantly faster), then yes, but otherwise I prefer the simplest
possible solution here. AFAIK, it will not make it simpler nor faster.
Using non-symbolized stacks should not be the common use-case (how to
usefully debug any data-race?).

> > +/* Compares symbolized strings of addr1 and addr2. */
> > +static int sym_strcmp(void *addr1, void *addr2)
> > +{
> > +       char buf1[64];
> > +       char buf2[64];
> > +
> > +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> > +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> > +       return strncmp(buf1, buf2, sizeof(buf1));
> > +}
> > +
> > +/*
> > + * Returns true if a report was generated, false otherwise.
> > + */
> > +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> > +                         int cpu_id, enum kcsan_report_type type)
> > +{
> > +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> > +       int num_stack_entries =
> > +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> > +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> > +       int other_skipnr;
> > +
> > +       /* Check if the top stackframe is in a blacklisted function. */
> > +       if (kcsan_skip_report(stack_entries[skipnr]))
> > +               return false;
> > +       if (type == kcsan_report_race_setup) {
> > +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> > +                                               other_info.num_stack_entries);
> > +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> > +                       return false;
> > +       }
> > +
> > +       /* Print report header. */
> > +       pr_err("==================================================================\n");
> > +       switch (type) {
> > +       case kcsan_report_race_setup: {
> > +               void *this_fn = (void *)stack_entries[skipnr];
> > +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> > +               int cmp;
> > +
> > +               /*
> > +                * Order functions lexographically for consistent bug titles.
> > +                * Do not print offset of functions to keep title short.
> > +                */
> > +               cmp = sym_strcmp(other_fn, this_fn);
> > +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> > +                      cmp < 0 ? other_fn : this_fn,
> > +                      cmp < 0 ? this_fn : other_fn);
> > +       } break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               pr_err("BUG: KCSAN: data-race in %pS\n",
> > +                      (void *)stack_entries[skipnr]);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +
> > +       pr_err("\n");
> > +
> > +       /* Print information about the racing accesses. */
> > +       switch (type) {
> > +       case kcsan_report_race_setup:
> > +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> > +                      get_access_type(other_info.is_write), other_info.ptr,
> > +                      other_info.size, get_thread_desc(other_info.task_pid),
> > +                      other_info.cpu_id);
> > +
> > +               /* Print the other thread's stack trace. */
> > +               stack_trace_print(other_info.stack_entries + other_skipnr,
> > +                                 other_info.num_stack_entries - other_skipnr,
> > +                                 0);
> > +
> > +               pr_err("\n");
> > +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> > +                      get_access_type(is_write), ptr, size,
> > +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> > +                      cpu_id);
> > +               break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> > +                      get_access_type(is_write), ptr, size,
> > +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> > +                      cpu_id);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +       /* Print stack trace of this thread. */
> > +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> > +                         0);
> > +
> > +       /* Print report footer. */
> > +       pr_err("\n");
> > +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> > +       dump_stack_print_info(KERN_DEFAULT);
> > +       pr_err("==================================================================\n");
> > +
> > +       return true;
> > +}
> > +
> > +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> > +                 int cpu_id, enum kcsan_report_type type)
> > +{
> > +       unsigned long flags = 0;
> > +
> > +       if (type == kcsan_report_race_check_race)
> > +               return;
> > +
> > +       kcsan_disable_current();
> > +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> > +               start_report(&flags, type);
> > +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> > +                   panic_on_warn)
> > +                       panic("panic_on_warn set ...\n");
> > +
> > +               end_report(&flags, type);
> > +       }
> > +       kcsan_enable_current();
> > +}
> > diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> > new file mode 100644
> > index 000000000000..68c896a24529
> > --- /dev/null
> > +++ b/kernel/kcsan/test.c
> > @@ -0,0 +1,117 @@
> > +// SPDX-License-Identifier: GPL-2.0
> IIRC checkpatch.pl requires all SPDX headers to look like this one
> (C++-style, not C-style).
> Please double check and fix the headers in other files if necessary.

Checkpatch is happy. // for .c, and /**/ for .h.

> This file might also use some comments, now it's not easy to
> understand what it's testing.

Done @ v3.

> > +
> > +#include <linux/init.h>
> > +#include <linux/kernel.h>
> > +#include <linux/printk.h>
> > +#include <linux/random.h>
> > +#include <linux/types.h>
> > +
> > +#include "encoding.h"
> > +
> > +#define ITERS_PER_TEST 2000
> > +
> > +/* Test requirements. */
> > +static bool test_requires(void)
> > +{
> > +       /* random should be initialized */
> > +       return prandom_u32() + prandom_u32() != 0;
> > +}
> > +
> > +/* Test watchpoint encode and decode. */
> > +static bool test_encode_decode(void)
> > +{
> > +       int i;
> > +
> > +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> > +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> > +               bool is_write = prandom_u32() % 2;
> > +               unsigned long addr;
> > +
> > +               prandom_bytes(&addr, sizeof(addr));
> > +               if (WARN_ON(!check_encodable(addr, size)))
> > +                       return false;
> > +
> > +               /* encode and decode */
> > +               {
> > +                       const long encoded_watchpoint =
> > +                               encode_watchpoint(addr, size, is_write);
> > +                       unsigned long verif_masked_addr;
> > +                       size_t verif_size;
> > +                       bool verif_is_write;
> > +
> > +                       /* check special watchpoints */
> > +                       if (WARN_ON(decode_watchpoint(
> > +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> > +                                   &verif_size, &verif_is_write)))
> > +                               return false;
> > +                       if (WARN_ON(decode_watchpoint(
> > +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> > +                                   &verif_size, &verif_is_write)))
> > +                               return false;
> > +
> > +                       /* check decoding watchpoint returns same data */
> > +                       if (WARN_ON(!decode_watchpoint(
> > +                                   encoded_watchpoint, &verif_masked_addr,
> > +                                   &verif_size, &verif_is_write)))
> > +                               return false;
> > +                       if (WARN_ON(verif_masked_addr !=
> > +                                   (addr & WATCHPOINT_ADDR_MASK)))
> > +                               goto fail;
> > +                       if (WARN_ON(verif_size != size))
> > +                               goto fail;
> > +                       if (WARN_ON(is_write != verif_is_write))
> > +                               goto fail;
> > +
> > +                       continue;
> > +fail:
> > +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> > +                              __func__, is_write ? "write" : "read", size,
> > +                              addr, encoded_watchpoint,
> > +                              verif_is_write ? "write" : "read", verif_size,
> > +                              verif_masked_addr);
> > +                       return false;
> > +               }
> > +       }
> > +
> > +       return true;
> > +}
> > +
> > +static bool test_matching_access(void)
> > +{
> > +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> > +               return false;
> > +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> > +               return false;
> > +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> > +               return false;
> > +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> > +               return false;
> > +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> > +               return false;
> > +       return true;
> > +}
> > +
> > +static int __init kcsan_selftest(void)
> > +{
> > +       int passed = 0;
> > +       int total = 0;
> > +
> > +#define RUN_TEST(do_test)                                                      \
> > +       do {                                                                   \
> > +               ++total;                                                       \
> > +               if (do_test())                                                 \
> > +                       ++passed;                                              \
> > +               else                                                           \
> > +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> > +       } while (0)
> > +
> > +       RUN_TEST(test_requires);
> > +       RUN_TEST(test_encode_decode);
> > +       RUN_TEST(test_matching_access);
> > +
> > +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> > +       if (passed != total)
> > +               panic("KCSAN selftests failed");
> > +       return 0;
> > +}
> > +postcore_initcall(kcsan_selftest);
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 93d97f9b0157..35accd1d93de 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
> >
> >  source "lib/Kconfig.ubsan"
> >
> > +source "lib/Kconfig.kcsan"
> > +
> >  config ARCH_HAS_DEVMEM_IS_ALLOWED
> >         bool
> >
> > diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> > new file mode 100644
> > index 000000000000..3e1f1acfb24b
> > --- /dev/null
> > +++ b/lib/Kconfig.kcsan
> > @@ -0,0 +1,88 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +config HAVE_ARCH_KCSAN
> > +       bool
> > +
> > +menuconfig KCSAN
> > +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> > +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> > +       default n
> > +       help
> > +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> > +         uses a watchpoint-based sampling approach to detect races.
> > +
> > +if KCSAN
> > +
> > +config KCSAN_SELFTEST
> > +       bool "KCSAN: perform short selftests on boot"
> > +       default y
> > +       help
> > +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> > +
> > +config KCSAN_EARLY_ENABLE
> > +       bool "KCSAN: early enable"
> > +       default y
> > +       help
> > +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> > +         later be enabled/disabled via debugfs.
> > +
> > +config KCSAN_UDELAY_MAX_TASK
> > +       int "KCSAN: maximum delay in microseconds (for tasks)"
> > +       default 80
> > +       help
> > +         For tasks, the max. microsecond delay after setting up a watchpoint.
> > +
> > +config KCSAN_UDELAY_MAX_INTERRUPT
> > +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> > +       default 20
> > +       help
> > +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> > +
> > +config KCSAN_DELAY_RANDOMIZE
> > +       bool "KCSAN: randomize delays"
> > +       default y
> > +       help
> > +         If delays should be randomized; if false, the chosen delay is simply
> > +         the maximum values defined above.
> > +
> > +config KCSAN_WATCH_SKIP_INST
> > +       int "KCSAN: watchpoint instruction skip"
> > +       default 2000
> > +       help
> > +         The number of per-CPU memory operations to skip watching, before
> > +         another watchpoint is set up; in other words, 1 in
> > +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> > +         watchpoint. A smaller value results in more aggressive race
> > +         detection, whereas a larger value improves system performance at the
> > +         cost of missing some races.
> > +
> > +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> > +       bool "KCSAN: report races of unknown origin"
> > +       default y
> > +       help
> > +         If KCSAN should report races where only one access is known, and the
> > +         conflicting access is of unknown origin. This type of race is
> > +         reported if it was only possible to infer a race due to a data-value
> > +         change while an access is being delayed on a watchpoint.
> > +
> > +config KCSAN_IGNORE_ATOMICS
> > +       bool "KCSAN: do not instrument marked atomic accesses"
> > +       default n
> > +       help
> > +         If enabled, never instruments marked atomic accesses. This results in
> > +         not reporting data-races where one access is atomic and the other is
> > +         a plain access.
> > +
> Isn't it better to decide at runtime, whether we want to ignore atomics or not?

See below.

> > +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> > +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> > +       default n
> > +       help
> > +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> > +         This option should only be used to prune initial data-races found in
> > +         existing code.
> Overall, I think it's better to make most of these configs boot-time flags.
> This way one won't need to rebuild the kernel every time they want to
> turn some option on or off.

From a design point of view, this complicates things on several
fronts. For one I would prefer having config options in one place,
however, most of these were added to "tame" syzbot, and keep reporting
volume initially low. I do not expect these to be switched frequently,
and for simplicity sake and to optimize for the common use-case, it'll
be better to keep it as-is. Eventually, these might even go away
completely.

I will add a comment to that effect above these options for v3.

> > +config KCSAN_DEBUG
> > +       bool "Debugging of KCSAN internals"
> > +       default n
> > +
> > +endif # KCSAN
> > diff --git a/lib/Makefile b/lib/Makefile
> > index c5892807e06f..778ab704e3ad 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
> >  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
> >  endif
> >
> > +# Used by KCSAN while enabled, avoid recursion.
> > +KCSAN_SANITIZE_random32.o := n
> > +
> >  lib-y := ctype.o string.o vsprintf.o cmdline.o \
> >          rbtree.o radix-tree.o timerqueue.o xarray.o \
> >          idr.o extable.o \
> > diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> > new file mode 100644
> > index 000000000000..caf1111a28ae
> > --- /dev/null
> > +++ b/scripts/Makefile.kcsan
> > @@ -0,0 +1,6 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +ifdef CONFIG_KCSAN
> > +
> > +CFLAGS_KCSAN := -fsanitize=thread
> > +
> > +endif # CONFIG_KCSAN
> > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> > index 179d55af5852..0e78abab7d83 100644
> > --- a/scripts/Makefile.lib
> > +++ b/scripts/Makefile.lib
> > @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
> >         $(CFLAGS_KCOV))
> >  endif
> >
> > +#
> > +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> "KernelConcurrencySanitizer" or "Kernel Concurrency Sanitizer", maybe?

Done @ v3.

> > +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> > +#
> > +ifeq ($(CONFIG_KCSAN),y)
> > +_c_flags += $(if $(patsubst n%,, \
> > +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> > +       $(CFLAGS_KCSAN))
> > +endif
> > +
> >  # $(srctree)/$(src) for including checkin headers from generated source files
> >  # $(objtree)/$(obj) for including generated headers from checkin source files
> >  ifeq ($(KBUILD_EXTMOD),)
> > --
> > 2.23.0.866.gb869b98d4c-goog
> >

Thanks for your comments!
-- Marco

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-21 15:54       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-21 15:54 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern, Andrea Parri,
	Andrey Konovalov, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Boqun Feng, Borislav Petkov, Daniel Axtens, Daniel Lustig,
	dave.hansen, David Howells, Dmitriy Vyukov, H. Peter Anvin,
	Ingo Molnar, Jade Alglave, Joel Fernandes, Jonathan Corbet,
	Josh Poimboeuf, Luc Maranget

On Mon, 21 Oct 2019 at 15:37, Alexander Potapenko <glider@google.com> wrote:
>
> On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
> >
> > Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> > kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> > See the included Documentation/dev-tools/kcsan.rst for more details.
> >
> > This patch adds basic infrastructure, but does not yet enable KCSAN for
> > any architecture.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v2:
> > * Elaborate comment about instrumentation calls emitted by compilers.
> > * Replace kcsan_check_access(.., {true, false}) with
> >   kcsan_check_{read,write} for improved readability.
> > * Change bug title of race of unknown origin to just say "data-race in".
> > * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> > * Add comment about safety of find_watchpoint without user_access_save.
> > * Remove unnecessary preempt_disable/enable and elaborate on comment why
> >   we want to disable interrupts and preemptions.
> > * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
> >   contexts [Suggested by Mark Rutland].
> > ---
> >  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
> >  MAINTAINERS                       |  11 +
> >  Makefile                          |   3 +-
> >  include/linux/compiler-clang.h    |   9 +
> >  include/linux/compiler-gcc.h      |   7 +
> >  include/linux/compiler.h          |  35 ++-
> >  include/linux/kcsan-checks.h      | 147 ++++++++++
> >  include/linux/kcsan.h             | 108 ++++++++
> >  include/linux/sched.h             |   4 +
> >  init/init_task.c                  |   8 +
> >  init/main.c                       |   2 +
> >  kernel/Makefile                   |   1 +
> >  kernel/kcsan/Makefile             |  14 +
> >  kernel/kcsan/atomic.c             |  21 ++
> >  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
> >  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
> >  kernel/kcsan/encoding.h           |  94 +++++++
> >  kernel/kcsan/kcsan.c              |  86 ++++++
> >  kernel/kcsan/kcsan.h              | 140 ++++++++++
> >  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
> >  kernel/kcsan/test.c               | 117 ++++++++
> >  lib/Kconfig.debug                 |   2 +
> >  lib/Kconfig.kcsan                 |  88 ++++++
> >  lib/Makefile                      |   3 +
> >  scripts/Makefile.kcsan            |   6 +
> >  scripts/Makefile.lib              |  10 +
> >  26 files changed, 2069 insertions(+), 9 deletions(-)
> >  create mode 100644 Documentation/dev-tools/kcsan.rst
> >  create mode 100644 include/linux/kcsan-checks.h
> >  create mode 100644 include/linux/kcsan.h
> >  create mode 100644 kernel/kcsan/Makefile
> >  create mode 100644 kernel/kcsan/atomic.c
> >  create mode 100644 kernel/kcsan/core.c
> >  create mode 100644 kernel/kcsan/debugfs.c
> >  create mode 100644 kernel/kcsan/encoding.h
> >  create mode 100644 kernel/kcsan/kcsan.c
> >  create mode 100644 kernel/kcsan/kcsan.h
> >  create mode 100644 kernel/kcsan/report.c
> >  create mode 100644 kernel/kcsan/test.c
> >  create mode 100644 lib/Kconfig.kcsan
> >  create mode 100644 scripts/Makefile.kcsan
> >
> > diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> > new file mode 100644
> > index 000000000000..497b09e5cc96
> > --- /dev/null
> > +++ b/Documentation/dev-tools/kcsan.rst
> > @@ -0,0 +1,203 @@
> > +The Kernel Concurrency Sanitizer (KCSAN)
> > +========================================
> > +
> > +Overview
> > +--------
> > +
> > +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> > +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> > +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> > +detector. Key priorities in KCSAN's design are lack of false positives,
> > +scalability, and simplicity. More details can be found in `Implementation
> > +Details`_.
> > +
> > +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> > +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> > +With Clang it requires version 7.0.0 or later.
> > +
> > +Usage
> > +-----
> > +
> > +To enable KCSAN configure kernel with::
> > +
> > +    CONFIG_KCSAN = y
> > +
> > +KCSAN provides several other configuration options to customize behaviour (see
> > +their respective help text for more info).
> > +
> > +debugfs
> > +~~~~~~~
> > +
> > +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> > +
> > +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> > +  ``/sys/kernel/debug/kcsan``.
> > +
> > +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> > +  ``some_func_name`` to the report filter list, which (by default) blacklists
> > +  reporting data-races where either one of the top stackframes are a function
> > +  in the list.
> > +
> > +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> > +  changes the report filtering behaviour. For example, the blacklist feature
> > +  can be used to silence frequently occurring data-races; the whitelist feature
> > +  can help with reproduction and testing of fixes.
> > +
> > +Error reports
> > +~~~~~~~~~~~~~
> > +
> > +A typical data-race report looks like this::
> > +
> > +    ==================================================================
> > +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> > +
> > +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> > +     kernfs_refresh_inode+0x70/0x170
> > +     kernfs_iop_permission+0x4f/0x90
> > +     inode_permission+0x190/0x200
> > +     link_path_walk.part.0+0x503/0x8e0
> > +     path_lookupat.isra.0+0x69/0x4d0
> > +     filename_lookup+0x136/0x280
> > +     user_path_at_empty+0x47/0x60
> > +     vfs_statx+0x9b/0x130
> > +     __do_sys_newlstat+0x50/0xb0
> > +     __x64_sys_newlstat+0x37/0x50
> > +     do_syscall_64+0x85/0x260
> > +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > +
> > +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> > +     generic_permission+0x5b/0x2a0
> > +     kernfs_iop_permission+0x66/0x90
> > +     inode_permission+0x190/0x200
> > +     link_path_walk.part.0+0x503/0x8e0
> > +     path_lookupat.isra.0+0x69/0x4d0
> > +     filename_lookup+0x136/0x280
> > +     user_path_at_empty+0x47/0x60
> > +     do_faccessat+0x11a/0x390
> > +     __x64_sys_access+0x3c/0x50
> > +     do_syscall_64+0x85/0x260
> > +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > +
> > +    Reported by Kernel Concurrency Sanitizer on:
> > +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +The header of the report provides a short summary of the functions involved in
> > +the race. It is followed by the access types and stack traces of the 2 threads
> > +involved in the data-race.
> > +
> > +The other less common type of data-race report looks like this::
> > +
> > +    ==================================================================
> > +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> > +
> > +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> > +     e1000_clean_rx_irq+0x551/0xb10
> > +     e1000_clean+0x533/0xda0
> > +     net_rx_action+0x329/0x900
> > +     __do_softirq+0xdb/0x2db
> > +     irq_exit+0x9b/0xa0
> > +     do_IRQ+0x9c/0xf0
> > +     ret_from_intr+0x0/0x18
> > +     default_idle+0x3f/0x220
> > +     arch_cpu_idle+0x21/0x30
> > +     do_idle+0x1df/0x230
> > +     cpu_startup_entry+0x14/0x20
> > +     rest_init+0xc5/0xcb
> > +     arch_call_rest_init+0x13/0x2b
> > +     start_kernel+0x6db/0x700
> > +
> > +    Reported by Kernel Concurrency Sanitizer on:
> > +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +This report is generated where it was not possible to determine the other
> > +racing thread, but a race was inferred due to the data-value of the watched
> > +memory location having changed. These can occur either due to missing
> > +instrumentation or e.g. DMA accesses.
> > +
> > +Data-Races
> > +----------
> Nit: I was under the impression "data races" were commonly written
> without a hyphen. I may be mistaken.

Thanks. I've updated it everywhere except in bug titles, which should
remain as-is.

> > +
> > +Informally, two operations *conflict* if they access the same memory location,
> > +and at least one of them is a write operation. In an execution, two memory
> > +operations from different threads form a **data-race** if they *conflict*, at
> > +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> > +the "happens-before" order according to the `LKMM
> > +<../../tools/memory-model/Documentation/explanation.txt>`_.
> > +
> > +Relationship with the Linux Kernel Memory Model (LKMM)
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +The LKMM defines the propagation and ordering rules of various memory
> > +operations, which gives developers the ability to reason about concurrent code.
> > +Ultimately this allows to determine the possible executions of concurrent code,
> > +and if that code is free from data-races.
> > +
> > +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> > +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> > +words, KCSAN assumes that as long as a plain access is not observed to race
> > +with another conflicting access, memory operations are correctly ordered.
> > +
> > +This means that KCSAN will not report *potential* data-races due to missing
> > +memory ordering. If, however, missing memory ordering (that is observable with
> > +a particular compiler and architecture) leads to an observable data-race (e.g.
> > +entering a critical section erroneously), KCSAN would report the resulting
> > +data-race.
> > +
> > +Implementation Details
> > +----------------------
> > +
> > +The general approach is inspired by `DataCollider
> > +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> > +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> > +relies on compiler instrumentation. Watchpoints are implemented using an
> > +efficient encoding that stores access type, size, and address in a long; the
> > +benefits of using "soft watchpoints" are portability and greater flexibility in
> > +limiting which accesses trigger a watchpoint.
> > +
> > +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> > +memory operations; for each instrumented plain access:
> > +
> > +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> > +   write, then we encountered a racing access.
> > +
> > +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> > +   stall some delay.
> > +
> > +3. Also check the data value before the delay, and re-check the data value
> > +   after delay; if the values mismatch, we infer a race of unknown origin.
> > +
> > +To detect data-races between plain and atomic memory operations, KCSAN also
> > +annotates atomic accesses, but only to check if a watchpoint exists
> > +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> > +accesses.
> > +
> > +Key Properties
> > +~~~~~~~~~~~~~~
> > +
> > +1. **Memory Overhead:** No shadow memory is required. The current
> > +   implementation uses a small array of longs to encode watchpoint information,
> > +   which is negligible.
> > +
> > +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> > +   efficient watchpoint encoding that does not require acquiring any shared
> > +   locks in the fast-path. For kernel boot with a default config on a system
> > +   where nproc=8 we measure a slow-down of 10-15x.
> > +
> > +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> > +   may result in missed data-races (false negatives), compared to a
> > +   happens-before data-race detector.
> > +
> > +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> > +
> > +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> > +   runtime. With a happens-before data-race detector, any omission leads to
> > +   false positives, which is especially important in the context of the kernel
> > +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> > +   result, maintenance overheads are minimal as the kernel evolves.
> > +
> > +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> > +   setting up watchpoints, racy writes from devices can also be detected.
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 0154674cbad3..71f7fb625490 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
> >  F:     scripts/kconfig/
> >  F:     scripts/Kconfig.include
> >
> > +KCSAN
> > +M:     Marco Elver <elver@google.com>
> > +R:     Dmitry Vyukov <dvyukov@google.com>
> > +L:     kasan-dev@googlegroups.com
> > +S:     Maintained
> > +F:     Documentation/dev-tools/kcsan.rst
> > +F:     include/linux/kcsan*.h
> > +F:     kernel/kcsan/
> > +F:     lib/Kconfig.kcsan
> > +F:     scripts/Makefile.kcsan
> > +
> >  KDUMP
> >  M:     Dave Young <dyoung@redhat.com>
> >  M:     Baoquan He <bhe@redhat.com>
> > diff --git a/Makefile b/Makefile
> > index ffd7a912fc46..ad4729176252 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
> >
> >  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
> >  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> > -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> > +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
> >  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
> >  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
> >  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> > @@ -900,6 +900,7 @@ endif
> >  include scripts/Makefile.kasan
> >  include scripts/Makefile.extrawarn
> >  include scripts/Makefile.ubsan
> > +include scripts/Makefile.kcsan
> >
> >  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
> >  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> > diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> > index 333a6695a918..a213eb55e725 100644
> > --- a/include/linux/compiler-clang.h
> > +++ b/include/linux/compiler-clang.h
> > @@ -24,6 +24,15 @@
> >  #define __no_sanitize_address
> >  #endif
> >
> > +#if __has_feature(thread_sanitizer)
> > +/* emulate gcc's __SANITIZE_THREAD__ flag */
> > +#define __SANITIZE_THREAD__
> > +#define __no_sanitize_thread \
> > +               __attribute__((no_sanitize("thread")))
> > +#else
> > +#define __no_sanitize_thread
> > +#endif
> > +
> >  /*
> >   * Not all versions of clang implement the the type-generic versions
> >   * of the builtin overflow checkers. Fortunately, clang implements
> > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> > index d7ee4c6bad48..de105ca29282 100644
> > --- a/include/linux/compiler-gcc.h
> > +++ b/include/linux/compiler-gcc.h
> > @@ -145,6 +145,13 @@
> >  #define __no_sanitize_address
> >  #endif
> >
> > +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> > +#define __no_sanitize_thread                                                   \
> > +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> > +#else
> > +#define __no_sanitize_thread
> > +#endif
> > +
> >  #if GCC_VERSION >= 50100
> >  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
> >  #endif
> > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > index 5e88e7e33abe..350d80dbee4d 100644
> > --- a/include/linux/compiler.h
> > +++ b/include/linux/compiler.h
> > @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> >  #endif
> >
> >  #include <uapi/linux/types.h>
> > +#include <linux/kcsan-checks.h>
> >
> >  #define __READ_ONCE_SIZE                                               \
> >  ({                                                                     \
> > @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> >         }                                                               \
> >  })
> >
> > -static __always_inline
> > -void __read_once_size(const volatile void *p, void *res, int size)
> > -{
> > -       __READ_ONCE_SIZE;
> > -}
> > -
> >  #ifdef CONFIG_KASAN
> >  /*
> >   * We can't declare function 'inline' because __no_sanitize_address confilcts
> > @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
> >  # define __no_kasan_or_inline __always_inline
> >  #endif
> >
> > -static __no_kasan_or_inline
> > +#ifdef CONFIG_KCSAN
> > +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> > +#else
> > +# define __no_kcsan_or_inline __always_inline
> > +#endif
> > +
> > +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> > +/* Avoid any instrumentation or inline. */
> > +#define __no_sanitize_or_inline                                                \
> > +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> > +#else
> > +#define __no_sanitize_or_inline __always_inline
> > +#endif
> > +
> > +static __no_kcsan_or_inline
> > +void __read_once_size(const volatile void *p, void *res, int size)
> > +{
> > +       kcsan_check_atomic_read((const void *)p, size);
> > +       __READ_ONCE_SIZE;
> > +}
> > +
> > +static __no_sanitize_or_inline
> >  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
> >  {
> >         __READ_ONCE_SIZE;
> >  }
> >
> > -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> > +static __no_kcsan_or_inline
> > +void __write_once_size(volatile void *p, void *res, int size)
> >  {
> > +       kcsan_check_atomic_write((const void *)p, size);
> > +
> >         switch (size) {
> >         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
> >         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> > diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> > new file mode 100644
> > index 000000000000..4203603ae852
> > --- /dev/null
> > +++ b/include/linux/kcsan-checks.h
> > @@ -0,0 +1,147 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _LINUX_KCSAN_CHECKS_H
> > +#define _LINUX_KCSAN_CHECKS_H
> > +
> > +#include <linux/types.h>
> > +
> > +/*
> > + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> > + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> > + * to validate access to an address.   Never use these in header files!
> > + */
> > +#ifdef CONFIG_KCSAN
> > +/**
> > + * __kcsan_check_watchpoint - check if a watchpoint exists
> > + *
> > + * Returns true if no race was detected, and we may then proceed to set up a
> > + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> > + * encountered, and we may not set up a watchpoint after.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + * @is_write is access a write
> > + * @return true if no race was detected, false otherwise.
> > + */
> > +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write);
> I think the parameter indentations are a bit off here and below (I've
> also looked at the Github diff);
> have you considered running checkpatch.pl?

It was formatted with clang-format, it's correct with 8 space tabs.
checkpath.pl is happy.

> > +
> > +/**
> > + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> > + *
> > + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> > + * reports the data-race.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + * @is_write is access a write
> > + */
> > +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write);
> > +#else
> > +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> > +                                           size_t size, bool is_write)
> > +{
> > +       return true;
> > +}
> > +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> > +                                           size_t size, bool is_write)
> > +{
> > +}
> > +#endif
> > +
> > +/*
> > + * kcsan_*: Only available when the particular compilation unit has KCSAN
> > + * instrumentation enabled. May be used in header files.
> > + */
> > +#ifdef __SANITIZE_THREAD__
> > +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> > +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> > +#else
> > +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> > +                                         bool is_write)
> > +{
> > +       return true;
> > +}
> > +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> > +                                         bool is_write)
> > +{
> > +}
> > +#endif
> > +
> > +/**
> > + * __kcsan_check_read - check regular read access for data-races
> > + *
> > + * Full read access that checks watchpoint and sets up a watchpoint if this
> > + * access is sampled. Note that, setting up watchpoints for plain reads is
> > + * required to also detect data-races with atomic accesses.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define __kcsan_check_read(ptr, size)                                          \
> > +       do {                                                                   \
> > +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> > +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> > +       } while (0)
> > +
> > +/**
> > + * __kcsan_check_write - check regular write access for data-races
> > + *
> > + * Full write access that checks watchpoint and sets up a watchpoint if this
> > + * access is sampled.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define __kcsan_check_write(ptr, size)                                         \
> > +       do {                                                                   \
> > +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> > +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> > +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> > +       } while (0)
> > +
> > +/**
> > + * kcsan_check_read - check regular read access for data-races
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define kcsan_check_read(ptr, size)                                            \
> > +       do {                                                                   \
> > +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> > +                       kcsan_setup_watchpoint(ptr, size, false);              \
> > +       } while (0)
> > +
> > +/**
> > + * kcsan_check_write - check regular write access for data-races
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define kcsan_check_write(ptr, size)                                           \
> > +       do {                                                                   \
> > +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> > +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> > +                       kcsan_setup_watchpoint(ptr, size, true);               \
> > +       } while (0)
> > +
> > +/*
> > + * Check for atomic accesses: if atomic access are not ignored, this simply
> > + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> > + */
> > +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> > +#define kcsan_check_atomic_read(...)                                           \
> > +       do {                                                                   \
> > +       } while (0)
> > +#define kcsan_check_atomic_write(...)                                          \
> > +       do {                                                                   \
> > +       } while (0)
> > +#else
> > +#define kcsan_check_atomic_read(ptr, size)                                     \
> > +       kcsan_check_watchpoint(ptr, size, false)
> > +#define kcsan_check_atomic_write(ptr, size)                                    \
> > +       kcsan_check_watchpoint(ptr, size, true)
> > +#endif
> > +
> > +#endif /* _LINUX_KCSAN_CHECKS_H */
> > diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> > new file mode 100644
> > index 000000000000..fd5de2ba3a16
> > --- /dev/null
> > +++ b/include/linux/kcsan.h
> > @@ -0,0 +1,108 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _LINUX_KCSAN_H
> > +#define _LINUX_KCSAN_H
> > +
> > +#include <linux/types.h>
> > +#include <linux/kcsan-checks.h>
> > +
> > +#ifdef CONFIG_KCSAN
> > +
> > +/*
> > + * Context for each thread of execution: for tasks, this is stored in
> > + * task_struct, and interrupts access internal per-CPU storage.
> > + */
> > +struct kcsan_ctx {
> > +       int disable; /* disable counter */
> > +       int atomic_next; /* number of following atomic ops */
> > +
> > +       /*
> > +        * We use separate variables to store if we are in a nestable or flat
> > +        * atomic region. This helps make sure that an atomic region with
> > +        * nesting support is not suddenly aborted when a flat region is
> > +        * contained within. Effectively this allows supporting nesting flat
> > +        * atomic regions within an outer nestable atomic region. Support for
> > +        * this is required as there are cases where a seqlock reader critical
> > +        * section (flat atomic region) is contained within a seqlock writer
> > +        * critical section (nestable atomic region), and the "mismatching
> > +        * kcsan_end_atomic()" warning would trigger otherwise.
> > +        */
> > +       int atomic_region;
> > +       bool atomic_region_flat;
> > +};
> > +
> > +/**
> > + * kcsan_init - initialize KCSAN runtime
> > + */
> > +void kcsan_init(void);
> > +
> > +/**
> > + * kcsan_disable_current - disable KCSAN for the current context
> > + *
> > + * Supports nesting.
> > + */
> > +void kcsan_disable_current(void);
> > +
> > +/**
> > + * kcsan_enable_current - re-enable KCSAN for the current context
> > + *
> > + * Supports nesting.
> > + */
> > +void kcsan_enable_current(void);
> > +
> > +/**
> > + * kcsan_begin_atomic - use to denote an atomic region
> > + *
> > + * Accesses within the atomic region may appear to race with other accesses but
> > + * should be considered atomic.
> > + *
> > + * @nest true if regions may be nested, or false for flat region
> > + */
> > +void kcsan_begin_atomic(bool nest);
> > +
> > +/**
> > + * kcsan_end_atomic - end atomic region
> > + *
> > + * @nest must match argument to kcsan_begin_atomic().
> > + */
> > +void kcsan_end_atomic(bool nest);
> > +
> > +/**
> > + * kcsan_atomic_next - consider following accesses as atomic
> > + *
> > + * Force treating the next n memory accesses for the current context as atomic
> > + * operations.
> > + *
> > + * @n number of following memory accesses to treat as atomic.
> > + */
> > +void kcsan_atomic_next(int n);
> > +
> > +#else /* CONFIG_KCSAN */
> > +
> > +static inline void kcsan_init(void)
> I think it should be ok to put {} on the same line with the function
> prototype here, see e.g. include/linux/kasan.h

Done @ v3.

> > +{
> > +}
> > +
> > +static inline void kcsan_disable_current(void)
> > +{
> > +}
> > +
> > +static inline void kcsan_enable_current(void)
> > +{
> > +}
> > +
> > +static inline void kcsan_begin_atomic(bool nest)
> > +{
> > +}
> > +
> > +static inline void kcsan_end_atomic(bool nest)
> > +{
> > +}
> > +
> > +static inline void kcsan_atomic_next(int n)
> > +{
> > +}
> > +
> > +#endif /* CONFIG_KCSAN */
> > +
> > +#endif /* _LINUX_KCSAN_H */
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 2c2e56bd8913..9490e417bf4a 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -31,6 +31,7 @@
> >  #include <linux/task_io_accounting.h>
> >  #include <linux/posix-timers.h>
> >  #include <linux/rseq.h>
> > +#include <linux/kcsan.h>
> >
> >  /* task_struct member predeclarations (sorted alphabetically): */
> >  struct audit_context;
> > @@ -1171,6 +1172,9 @@ struct task_struct {
> >  #ifdef CONFIG_KASAN
> >         unsigned int                    kasan_depth;
> >  #endif
> > +#ifdef CONFIG_KCSAN
> > +       struct kcsan_ctx                kcsan_ctx;
> > +#endif
> >
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >         /* Index of current stored address in ret_stack: */
> > diff --git a/init/init_task.c b/init/init_task.c
> > index 9e5cbe5eab7b..e229416c3314 100644
> > --- a/init/init_task.c
> > +++ b/init/init_task.c
> > @@ -161,6 +161,14 @@ struct task_struct init_task
> >  #ifdef CONFIG_KASAN
> >         .kasan_depth    = 1,
> >  #endif
> > +#ifdef CONFIG_KCSAN
> > +       .kcsan_ctx = {
> > +               .disable                = 1,
> > +               .atomic_next            = 0,
> > +               .atomic_region          = 0,
> > +               .atomic_region_flat     = 0,
> > +       },
> > +#endif
> >  #ifdef CONFIG_TRACE_IRQFLAGS
> >         .softirqs_enabled = 1,
> >  #endif
> > diff --git a/init/main.c b/init/main.c
> > index 91f6ebb30ef0..4d814de017ee 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -93,6 +93,7 @@
> >  #include <linux/rodata_test.h>
> >  #include <linux/jump_label.h>
> >  #include <linux/mem_encrypt.h>
> > +#include <linux/kcsan.h>
> >
> >  #include <asm/io.h>
> >  #include <asm/bugs.h>
> > @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
> >         acpi_subsystem_init();
> >         arch_post_acpi_subsys_init();
> >         sfi_init_late();
> > +       kcsan_init();
> >
> >         /* Do the rest non-__init'ed, we're now alive */
> >         arch_call_rest_init();
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index daad787fb795..74ab46e2ebd1 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
> >  obj-$(CONFIG_IRQ_WORK) += irq_work.o
> >  obj-$(CONFIG_CPU_PM) += cpu_pm.o
> >  obj-$(CONFIG_BPF) += bpf/
> > +obj-$(CONFIG_KCSAN) += kcsan/
> >
> >  obj-$(CONFIG_PERF_EVENTS) += events/
> >
> > diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> > new file mode 100644
> > index 000000000000..c25f07062d26
> > --- /dev/null
> > +++ b/kernel/kcsan/Makefile
> > @@ -0,0 +1,14 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +KCSAN_SANITIZE := n
> > +KCOV_INSTRUMENT := n
> > +
> > +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> > +
> > +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> > +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> > +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> > +
> > +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> > +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> > diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> > new file mode 100644
> > index 000000000000..dd44f7d9e491
> > --- /dev/null
> > +++ b/kernel/kcsan/atomic.c
> > @@ -0,0 +1,21 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/jiffies.h>
> > +
> > +#include "kcsan.h"
> > +
> > +/*
> > + * List all volatile globals that have been observed in races, to suppress
> > + * data-race reports between accesses to these variables.
> > + *
> > + * For now, we assume that volatile accesses of globals are as strong as atomic
> > + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> > + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> > + * than cast to volatile. Eventually, we hope to be able to remove this
> > + * function.
> > + */
> > +bool kcsan_is_atomic(const volatile void *ptr)
> > +{
> > +       /* only jiffies for now */
> > +       return ptr == &jiffies;
> > +}
> > diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> > new file mode 100644
> > index 000000000000..bc8d60b129eb
> > --- /dev/null
> > +++ b/kernel/kcsan/core.c
> > @@ -0,0 +1,428 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/atomic.h>
> > +#include <linux/bug.h>
> > +#include <linux/delay.h>
> > +#include <linux/export.h>
> > +#include <linux/init.h>
> > +#include <linux/percpu.h>
> > +#include <linux/preempt.h>
> > +#include <linux/random.h>
> > +#include <linux/sched.h>
> > +#include <linux/uaccess.h>
> > +
> > +#include "kcsan.h"
> > +#include "encoding.h"
> > +
> > +/*
> > + * Helper macros to iterate slots, starting from address slot itself, followed
> > + * by the right and left slots.
> > + */
> > +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> > +#define SLOT_IDX(slot, i)                                                      \
> > +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> > +                 KCSAN_CHECK_ADJACENT)) %                                     \
> > +        KCSAN_NUM_WATCHPOINTS)
> > +
> > +bool kcsan_enabled;
> > +
> > +/* Per-CPU kcsan_ctx for interrupts */
> > +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> > +       .disable = 0,
> > +       .atomic_next = 0,
> > +       .atomic_region = 0,
> > +       .atomic_region_flat = 0,
> > +};
> > +
> > +/*
> > + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> > + * able to safely update and access a watchpoint without introducing locking
> > + * overhead, we encode each watchpoint as a single atomic long. The initial
> > + * zero-initialized state matches INVALID_WATCHPOINT.
> > + */
> > +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> > +
> > +/*
> > + * Instructions skipped counter; see should_watch().
> > + */
> > +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> > +
> > +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> > +                                            bool expect_write,
> > +                                            long *encoded_watchpoint)
> > +{
> > +       const int slot = watchpoint_slot(addr);
> > +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> > +       atomic_long_t *watchpoint;
> > +       unsigned long wp_addr_masked;
> > +       size_t wp_size;
> > +       bool is_write;
> > +       int i;
> > +
> > +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> > +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> > +               *encoded_watchpoint = atomic_long_read(watchpoint);
> > +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> > +                                      &wp_size, &is_write))
> > +                       continue;
> > +
> > +               if (expect_write && !is_write)
> > +                       continue;
> > +
> > +               /* Check if the watchpoint matches the access. */
> > +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> > +                       return watchpoint;
> > +       }
> > +
> > +       return NULL;
> > +}
> > +
> > +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> > +                                              bool is_write)
> > +{
> > +       const int slot = watchpoint_slot(addr);
> > +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> > +       atomic_long_t *watchpoint;
> > +       int i;
> > +
> > +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> > +               long expect_val = INVALID_WATCHPOINT;
> > +
> > +               /* Try to acquire this slot. */
> > +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> > +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> > +                                                   encoded_watchpoint))
> > +                       return watchpoint;
> > +       }
> > +
> > +       return NULL;
> > +}
> > +
> > +/*
> > + * Return true if watchpoint was successfully consumed, false otherwise.
> > + *
> > + * This may return false if:
> > + *
> > + *     1. another thread already consumed the watchpoint;
> > + *     2. the thread that set up the watchpoint already removed it;
> > + *     3. the watchpoint was removed and then re-used.
> > + */
> > +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> > +                                         long encoded_watchpoint)
> > +{
> > +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> > +                                              CONSUMED_WATCHPOINT);
> > +}
> > +
> > +/*
> > + * Return true if watchpoint was not touched, false if consumed.
> > + */
> > +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> > +{
> > +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> > +              CONSUMED_WATCHPOINT;
> > +}
> > +
> > +static inline struct kcsan_ctx *get_ctx(void)
> > +{
> > +       /*
> > +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> > +        * also result in calls that generate warnings in uaccess regions.
> > +        */
> > +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> > +}
> > +
> > +
> > +static inline bool is_atomic(const volatile void *ptr)
> > +{
> > +       struct kcsan_ctx *ctx = get_ctx();
> > +
> > +       if (unlikely(ctx->atomic_next > 0)) {
> > +               --ctx->atomic_next;
> > +               return true;
> > +       }
> > +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> > +               return true;
> Won't ctx->atomic_region suffice for both flat and non-flat regions?
> (Do we really need the flat ones?)

The comment in include/linux/kcsan.h explains:
/*
* We use separate variables to store if we are in a nestable or flat
* atomic region. This helps make sure that an atomic region with
* nesting support is not suddenly aborted when a flat region is
* contained within. Effectively this allows supporting nesting flat
* atomic regions within an outer nestable atomic region. Support for
* this is required as there are cases where a seqlock reader critical
* section (flat atomic region) is contained within a seqlock writer
* critical section (nestable atomic region), and the "mismatching
* kcsan_end_atomic()" warning would trigger otherwise.
*/


> > +       return kcsan_is_atomic(ptr);
> > +}
> > +
> > +static inline bool should_watch(const volatile void *ptr)
> > +{
> > +       /*
> > +        * Never set up watchpoints when memory operations are atomic.
> > +        *
> > +        * We need to check this first, because: 1) atomics should not count
> > +        * towards skipped instructions below, and 2) to actually decrement
> > +        * kcsan_atomic_next for each atomic.
> > +        */
> > +       if (is_atomic(ptr))
> > +               return false;
> > +
> > +       /*
> > +        * We use a per-CPU counter, to avoid excessive contention; there is
> > +        * still enough non-determinism for the precise instructions that end up
> > +        * being watched to be mostly unpredictable. Using a PRNG like
> > +        * prandom_u32() turned out to be too slow.
> > +        */
> > +       return (this_cpu_inc_return(kcsan_skip) %
> > +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> > +}
> > +
> > +static inline bool is_enabled(void)
> > +{
> > +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> > +}
> > +
> > +static inline unsigned int get_delay(void)
> > +{
> > +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> > +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> > +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> > +                      ((prandom_u32() % max_delay) + 1) :
> > +                      max_delay;
> > +}
> > +
> > +/* === Public interface ===================================================== */
> > +
> > +void __init kcsan_init(void)
> > +{
> > +       BUG_ON(!in_task());
> > +
> > +       kcsan_debugfs_init();
> > +       kcsan_enable_current();
> > +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> > +       /*
> > +        * We are in the init task, and no other tasks should be running.
> > +        */
> > +       WRITE_ONCE(kcsan_enabled, true);
> > +#endif
> > +}
> > +
> > +/* === Exported interface =================================================== */
> > +
> > +void kcsan_disable_current(void)
> > +{
> > +       ++get_ctx()->disable;
> > +}
> > +EXPORT_SYMBOL(kcsan_disable_current);
> > +
> > +void kcsan_enable_current(void)
> > +{
> > +       if (get_ctx()->disable-- == 0) {
> > +               kcsan_disable_current(); /* restore to 0 */
> > +               kcsan_disable_current();
> > +               WARN(1, "mismatching %s", __func__);
> > +               kcsan_enable_current();
> > +       }
> > +}
> > +EXPORT_SYMBOL(kcsan_enable_current);
> > +
> > +void kcsan_begin_atomic(bool nest)
> > +{
> > +       if (nest)
> > +               ++get_ctx()->atomic_region;
> > +       else
> > +               get_ctx()->atomic_region_flat = true;
> > +}
> > +EXPORT_SYMBOL(kcsan_begin_atomic);
> > +
> > +void kcsan_end_atomic(bool nest)
> > +{
> > +       if (nest) {
> > +               if (get_ctx()->atomic_region-- == 0) {
> > +                       kcsan_begin_atomic(true); /* restore to 0 */
> > +                       kcsan_disable_current();
> > +                       WARN(1, "mismatching %s", __func__);
> > +                       kcsan_enable_current();
> > +               }
> > +       } else {
> > +               get_ctx()->atomic_region_flat = false;
> > +       }
> > +}
> > +EXPORT_SYMBOL(kcsan_end_atomic);
> > +
> > +void kcsan_atomic_next(int n)
> > +{
> > +       get_ctx()->atomic_next = n;
> > +}
> > +EXPORT_SYMBOL(kcsan_atomic_next);
> > +
> > +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write)
> > +{
> > +       atomic_long_t *watchpoint;
> > +       long encoded_watchpoint;
> > +       unsigned long flags;
> > +       enum kcsan_report_type report_type;
> > +
> > +       if (unlikely(!is_enabled()))
> > +               return false;
> > +
> > +       /*
> > +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> > +        * without user_access_save, as the address that ptr points to is only
> > +        * used to check if a watchpoint exists; ptr is never dereferenced.
> > +        */
> > +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> > +                                    &encoded_watchpoint);
> > +       if (watchpoint == NULL)
> > +               return true;
> > +
> > +       flags = user_access_save();
> > +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> > +               /*
> > +                * The other thread may not print any diagnostics, as it has
> > +                * already removed the watchpoint, or another thread consumed
> > +                * the watchpoint before this thread.
> > +                */
> > +               kcsan_counter_inc(kcsan_counter_report_races);
> > +               report_type = kcsan_report_race_check_race;
> > +       } else {
> > +               report_type = kcsan_report_race_check;
> > +       }
> > +
> > +       /* Encountered a data-race. */
> > +       kcsan_counter_inc(kcsan_counter_data_races);
> > +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> > +
> > +       user_access_restore(flags);
> > +       return false;
> > +}
> > +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> > +
> > +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write)
> > +{
> > +       atomic_long_t *watchpoint;
> > +       union {
> > +               u8 _1;
> > +               u16 _2;
> > +               u32 _4;
> > +               u64 _8;
> > +       } expect_value;
> > +       bool is_expected = true;
> > +       unsigned long ua_flags = user_access_save();
> > +       unsigned long irq_flags;
> > +
> > +       if (!should_watch(ptr))
> > +               goto out;
> > +
> > +       if (!check_encodable((unsigned long)ptr, size)) {
> > +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> > +               goto out;
> > +       }
> > +
> > +       /*
> > +        * Disable interrupts & preemptions to avoid another thread on the same
> > +        * CPU accessing memory locations for the set up watchpoint; this is to
> > +        * avoid reporting races to e.g. CPU-local data.
> > +        *
> > +        * An alternative would be adding the source CPU to the watchpoint
> > +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> > +        * several problems with this:
> > +        *   1. we should avoid stealing more bits from the watchpoint encoding
> > +        *      as it would affect accuracy, as well as increase performance
> > +        *      overhead in the fast-path;
> > +        *   2. if we are preempted, but there *is* a genuine data-race, we
> > +        *      would *not* report it -- since this is the common case (vs.
> > +        *      CPU-local data accesses), it makes more sense (from a data-race
> > +        *      detection PoV) to simply disable preemptions to ensure as many
> > +        *      tasks as possible run on other CPUs.
> > +        */
> > +       local_irq_save(irq_flags);
> > +
> > +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> > +       if (watchpoint == NULL) {
> > +               /*
> > +                * Out of capacity: the size of `watchpoints`, and the frequency
> > +                * with which `should_watch()` returns true should be tweaked so
> > +                * that this case happens very rarely.
> > +                */
> > +               kcsan_counter_inc(kcsan_counter_no_capacity);
> > +               goto out_unlock;
> > +       }
> > +
> > +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> > +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> > +
> > +       /*
> > +        * Read the current value, to later check and infer a race if the data
> > +        * was modified via a non-instrumented access, e.g. from a device.
> > +        */
> > +       switch (size) {
> > +       case 1:
> > +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> > +               break;
> > +       case 2:
> > +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> > +               break;
> > +       case 4:
> > +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> > +               break;
> > +       case 8:
> > +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> > +               break;
> > +       default:
> > +               break; /* ignore; we do not diff the values */
> > +       }
> > +
> > +#ifdef CONFIG_KCSAN_DEBUG
> > +       kcsan_disable_current();
> > +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> > +              is_write ? "write" : "read", size, ptr,
> > +              watchpoint_slot((unsigned long)ptr),
> > +              encode_watchpoint((unsigned long)ptr, size, is_write));
> > +       kcsan_enable_current();
> > +#endif
> > +
> > +       /*
> > +        * Delay this thread, to increase probability of observing a racy
> > +        * conflicting access.
> > +        */
> > +       udelay(get_delay());
> > +
> > +       /*
> > +        * Re-read value, and check if it is as expected; if not, we infer a
> > +        * racy access.
> > +        */
> > +       switch (size) {
> > +       case 1:
> > +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> > +               break;
> > +       case 2:
> > +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> > +               break;
> > +       case 4:
> > +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> > +               break;
> > +       case 8:
> > +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> > +               break;
> > +       default:
> > +               break; /* ignore; we do not diff the values */
> > +       }
> > +
> > +       /* Check if this access raced with another. */
> > +       if (!remove_watchpoint(watchpoint)) {
> > +               /*
> > +                * No need to increment 'race' counter, as the racing thread
> > +                * already did.
> > +                */
> > +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                            kcsan_report_race_setup);
> > +       } else if (!is_expected) {
> > +               /* Inferring a race, since the value should not have changed. */
> > +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> > +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> > +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                            kcsan_report_race_unknown_origin);
> > +#endif
> > +       }
> > +
> > +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> > +out_unlock:
> > +       local_irq_restore(irq_flags);
> > +out:
> > +       user_access_restore(ua_flags);
> > +}
> > +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> > diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> > new file mode 100644
> > index 000000000000..6ddcbd185f3a
> > --- /dev/null
> > +++ b/kernel/kcsan/debugfs.c
> > @@ -0,0 +1,225 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/atomic.h>
> > +#include <linux/bsearch.h>
> > +#include <linux/bug.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/init.h>
> > +#include <linux/kallsyms.h>
> > +#include <linux/mm.h>
> > +#include <linux/seq_file.h>
> > +#include <linux/sort.h>
> > +#include <linux/string.h>
> > +#include <linux/uaccess.h>
> > +
> > +#include "kcsan.h"
> > +
> > +/*
> > + * Statistics counters.
> > + */
> > +static atomic_long_t counters[kcsan_counter_count];
> > +
> > +/*
> > + * Addresses for filtering functions from reporting. This list can be used as a
> > + * whitelist or blacklist.
> > + */
> > +static struct {
> > +       unsigned long *addrs; /* array of addresses */
> > +       size_t size; /* current size */
> > +       int used; /* number of elements used */
> > +       bool sorted; /* if elements are sorted */
> > +       bool whitelist; /* if list is a blacklist or whitelist */
> > +} report_filterlist = {
> > +       .addrs = NULL,
> > +       .size = 8, /* small initial size */
> > +       .used = 0,
> > +       .sorted = false,
> > +       .whitelist = false, /* default is blacklist */
> > +};
> > +static DEFINE_SPINLOCK(report_filterlist_lock);
> > +
> > +static const char *counter_to_name(enum kcsan_counter_id id)
> > +{
> > +       switch (id) {
> > +       case kcsan_counter_used_watchpoints:
> > +               return "used_watchpoints";
> > +       case kcsan_counter_setup_watchpoints:
> > +               return "setup_watchpoints";
> > +       case kcsan_counter_data_races:
> > +               return "data_races";
> > +       case kcsan_counter_no_capacity:
> > +               return "no_capacity";
> > +       case kcsan_counter_report_races:
> > +               return "report_races";
> > +       case kcsan_counter_races_unknown_origin:
> > +               return "races_unknown_origin";
> > +       case kcsan_counter_unencodable_accesses:
> > +               return "unencodable_accesses";
> > +       case kcsan_counter_encoding_false_positives:
> > +               return "encoding_false_positives";
> > +       case kcsan_counter_count:
> > +               BUG();
> > +       }
> > +       return NULL;
> > +}
> > +
> > +void kcsan_counter_inc(enum kcsan_counter_id id)
> > +{
> > +       atomic_long_inc(&counters[id]);
> > +}
> > +
> > +void kcsan_counter_dec(enum kcsan_counter_id id)
> > +{
> > +       atomic_long_dec(&counters[id]);
> > +}
> > +
> > +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> > +{
> > +       const unsigned long a = *(const unsigned long *)rhs;
> > +       const unsigned long b = *(const unsigned long *)lhs;
> > +
> > +       return a < b ? -1 : a == b ? 0 : 1;
> > +}
> > +
> > +bool kcsan_skip_report(unsigned long func_addr)
> > +{
> > +       unsigned long symbolsize, offset;
> > +       unsigned long flags;
> > +       bool ret = false;
> > +
> > +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> > +               return false;
> > +       func_addr -= offset; /* get function start */
> > +
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +       if (report_filterlist.used == 0)
> > +               goto out;
> > +
> > +       /* Sort array if it is unsorted, and then do a binary search. */
> > +       if (!report_filterlist.sorted) {
> > +               sort(report_filterlist.addrs, report_filterlist.used,
> > +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> > +               report_filterlist.sorted = true;
> > +       }
> > +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> > +                       report_filterlist.used, sizeof(unsigned long),
> > +                       cmp_filterlist_addrs);
> > +       if (report_filterlist.whitelist)
> > +               ret = !ret;
> > +
> > +out:
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +       return ret;
> > +}
> > +
> > +static void set_report_filterlist_whitelist(bool whitelist)
> > +{
> > +       unsigned long flags;
> > +
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +       report_filterlist.whitelist = whitelist;
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +}
> > +
> > +static void insert_report_filterlist(const char *func)
> > +{
> > +       unsigned long flags;
> > +       unsigned long addr = kallsyms_lookup_name(func);
> > +
> > +       if (!addr) {
> > +               pr_err("KCSAN: could not find function: '%s'\n", func);
> > +               return;
> > +       }
> > +
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +
> > +       if (report_filterlist.addrs == NULL)
> > +               report_filterlist.addrs = /* initial allocation */
> > +                       kvmalloc_array(report_filterlist.size,
> > +                                      sizeof(unsigned long), GFP_KERNEL);
> You need to use braces in both branches here:
> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#placing-braces-and-spaces

Done @ v3.

> > +       else if (report_filterlist.used == report_filterlist.size) {
> > +               /* resize filterlist */
> > +               unsigned long *new_addrs;
> > +
> > +               report_filterlist.size *= 2;
> > +               new_addrs = kvmalloc_array(report_filterlist.size,
> > +                                          sizeof(unsigned long), GFP_KERNEL);
> > +               memcpy(new_addrs, report_filterlist.addrs,
> > +                      report_filterlist.used * sizeof(unsigned long));
> > +               kvfree(report_filterlist.addrs);
> > +               report_filterlist.addrs = new_addrs;
> > +       }
> > +
> > +       /* Note: deduplicating should be done in userspace. */
> > +       report_filterlist.addrs[report_filterlist.used++] =
> > +               kallsyms_lookup_name(func);
> > +       report_filterlist.sorted = false;
> > +
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +}
> > +
> > +static int show_info(struct seq_file *file, void *v)
> > +{
> > +       int i;
> > +       unsigned long flags;
> > +
> > +       /* show stats */
> > +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> > +       for (i = 0; i < kcsan_counter_count; ++i)
> > +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> > +                          atomic_long_read(&counters[i]));
> > +
> > +       /* show filter functions, and filter type */
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +       seq_printf(file, "\n%s functions: %s\n",
> > +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> > +                  report_filterlist.used == 0 ? "none" : "");
> > +       for (i = 0; i < report_filterlist.used; ++i)
> > +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +
> > +       return 0;
> > +}
> > +
> > +static int debugfs_open(struct inode *inode, struct file *file)
> > +{
> > +       return single_open(file, show_info, NULL);
> > +}
> > +
> > +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> > +                            size_t count, loff_t *off)
> > +{
> > +       char kbuf[KSYM_NAME_LEN];
> > +       char *arg;
> > +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> > +
> > +       if (copy_from_user(kbuf, buf, read_len))
> > +               return -EINVAL;
> > +       kbuf[read_len] = '\0';
> > +       arg = strstrip(kbuf);
> > +
> > +       if (!strncmp(arg, "on", sizeof("on") - 1))
> > +               WRITE_ONCE(kcsan_enabled, true);
> > +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> > +               WRITE_ONCE(kcsan_enabled, false);
> > +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> > +               set_report_filterlist_whitelist(true);
> > +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> > +               set_report_filterlist_whitelist(false);
> > +       else if (arg[0] == '!')
> > +               insert_report_filterlist(&arg[1]);
> > +       else
> > +               return -EINVAL;
> > +
> > +       return count;
> > +}
> > +
> > +static const struct file_operations debugfs_ops = { .read = seq_read,
> > +                                                   .open = debugfs_open,
> > +                                                   .write = debugfs_write,
> > +                                                   .release = single_release };
> > +
> > +void __init kcsan_debugfs_init(void)
> > +{
> > +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> > +}
> > diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> > new file mode 100644
> > index 000000000000..8f9b1ce0e59f
> > --- /dev/null
> > +++ b/kernel/kcsan/encoding.h
> > @@ -0,0 +1,94 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _MM_KCSAN_ENCODING_H
> > +#define _MM_KCSAN_ENCODING_H
> > +
> > +#include <linux/bits.h>
> > +#include <linux/log2.h>
> > +#include <linux/mm.h>
> > +
> > +#include "kcsan.h"
> > +
> > +#define SLOT_RANGE PAGE_SIZE
> > +#define INVALID_WATCHPOINT 0
> > +#define CONSUMED_WATCHPOINT 1
> > +
> > +/*
> > + * The maximum useful size of accesses for which we set up watchpoints is the
> > + * max range of slots we check on an access.
> > + */
> > +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> > +
> > +/*
> > + * Number of bits we use to store size info.
> > + */
> > +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> > +/*
> > + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> > + * however, most 64-bit architectures do not use the full 64-bit address space.
> > + * Also, in order for a false positive to be observable 2 things need to happen:
> > + *
> > + *     1. different addresses but with the same encoded address race;
> > + *     2. and both map onto the same watchpoint slots;
> > + *
> > + * Both these are assumed to be very unlikely. However, in case it still happens
> > + * happens, the report logic will filter out the false positive (see report.c).
> > + */
> > +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> > +
> > +/*
> > + * Masks to set/retrieve the encoded data.
> > + */
> > +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> > +#define WATCHPOINT_SIZE_MASK                                                   \
> > +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> > +#define WATCHPOINT_ADDR_MASK                                                   \
> > +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> > +
> > +static inline bool check_encodable(unsigned long addr, size_t size)
> > +{
> > +       return size <= MAX_ENCODABLE_SIZE;
> > +}
> > +
> > +static inline long encode_watchpoint(unsigned long addr, size_t size,
> > +                                    bool is_write)
> > +{
> > +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> > +                     (size << WATCHPOINT_ADDR_BITS) |
> > +                     (addr & WATCHPOINT_ADDR_MASK));
> > +}
> > +
> > +static inline bool decode_watchpoint(long watchpoint,
> > +                                    unsigned long *addr_masked, size_t *size,
> > +                                    bool *is_write)
> > +{
> > +       if (watchpoint == INVALID_WATCHPOINT ||
> > +           watchpoint == CONSUMED_WATCHPOINT)
> > +               return false;
> > +
> > +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> > +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> > +               WATCHPOINT_ADDR_BITS;
> > +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> > +
> > +       return true;
> > +}
> > +
> > +/*
> > + * Return watchpoint slot for an address.
> > + */
> > +static inline int watchpoint_slot(unsigned long addr)
> > +{
> > +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> > +}
> > +
> > +static inline bool matching_access(unsigned long addr1, size_t size1,
> > +                                  unsigned long addr2, size_t size2)
> > +{
> > +       unsigned long end_range1 = addr1 + size1 - 1;
> > +       unsigned long end_range2 = addr2 + size2 - 1;
> > +
> > +       return addr1 <= end_range2 && addr2 <= end_range1;
> > +}
> > +
> > +#endif /* _MM_KCSAN_ENCODING_H */
> > diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> > new file mode 100644
> > index 000000000000..45cf2fffd8a0
> > --- /dev/null
> > +++ b/kernel/kcsan/kcsan.c
> > @@ -0,0 +1,86 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> > + * see Documentation/dev-tools/kcsan.rst.
> > + */
> > +
> > +#include <linux/export.h>
> > +
> > +#include "kcsan.h"
> > +
> > +/*
> > + * KCSAN uses the same instrumentation that is emitted by supported compilers
> > + * for Thread Sanitizer (TSAN).
> > + *
> > + * When enabled, the compiler emits instrumentation calls (the functions
> > + * prefixed with "__tsan" below) for all loads and stores that it generated;
> > + * inline asm is not instrumented.
> > + */
> > +
> > +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> > +       void __tsan_read##size(void *ptr)                                      \
> > +       {                                                                      \
> > +               __kcsan_check_read(ptr, size);                                 \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> > +       void __tsan_write##size(void *ptr)                                     \
> > +       {                                                                      \
> > +               __kcsan_check_write(ptr, size);                                \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_write##size)
> > +
> > +DEFINE_TSAN_READ_WRITE(1);
> > +DEFINE_TSAN_READ_WRITE(2);
> > +DEFINE_TSAN_READ_WRITE(4);
> > +DEFINE_TSAN_READ_WRITE(8);
> > +DEFINE_TSAN_READ_WRITE(16);
> > +
> > +/*
> > + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> > + * but e.g. recent versions of Clang do.
> > + */
> > +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> > +       void __tsan_unaligned_read##size(void *ptr)                            \
> > +       {                                                                      \
> > +               __kcsan_check_read(ptr, size);                                 \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> > +       void __tsan_unaligned_write##size(void *ptr)                           \
> > +       {                                                                      \
> > +               __kcsan_check_write(ptr, size);                                \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> > +
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> > +
> > +void __tsan_read_range(void *ptr, size_t size)
> > +{
> > +       __kcsan_check_read(ptr, size);
> > +}
> > +EXPORT_SYMBOL(__tsan_read_range);
> > +
> > +void __tsan_write_range(void *ptr, size_t size)
> > +{
> > +       __kcsan_check_write(ptr, size);
> > +}
> > +EXPORT_SYMBOL(__tsan_write_range);
> > +
> > +/*
> > + * The below are not required KCSAN, but can still be emitted by the compiler.
> > + */
> > +void __tsan_func_entry(void *call_pc)
> > +{
> > +}
> > +EXPORT_SYMBOL(__tsan_func_entry);
> > +void __tsan_func_exit(void)
> > +{
> > +}
> > +EXPORT_SYMBOL(__tsan_func_exit);
> > +void __tsan_init(void)
> > +{
> > +}
> > +EXPORT_SYMBOL(__tsan_init);
> > diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> > new file mode 100644
> > index 000000000000..429479b3041d
> > --- /dev/null
> > +++ b/kernel/kcsan/kcsan.h
> > @@ -0,0 +1,140 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _MM_KCSAN_KCSAN_H
> > +#define _MM_KCSAN_KCSAN_H
> > +
> > +#include <linux/kcsan.h>
> > +
> > +/*
> > + * Total number of watchpoints. An address range maps into a specific slot as
> > + * specified in `encoding.h`. Although larger number of watchpoints may not even
> > + * be usable due to limited thread count, a larger value will improve
> > + * performance due to reducing cache-line contention.
> > + */
> > +#define KCSAN_NUM_WATCHPOINTS 64
> > +
> > +/*
> > + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> > + *
> > + *     1. the address slot is already occupied, check if any adjacent slots are
> > + *        free;
> > + *     2. accesses that straddle a slot boundary due to size that exceeds a
> > + *        slot's range may check adjacent slots if any watchpoint matches.
> > + *
> > + * Note that accesses with very large size may still miss a watchpoint; however,
> > + * given this should be rare, this is a reasonable trade-off to make, since this
> > + * will avoid:
> > + *
> > + *     1. excessive contention between watchpoint checks and setup;
> > + *     2. larger number of simultaneous watchpoints without sacrificing
> > + *        performance.
> > + */
> > +#define KCSAN_CHECK_ADJACENT 1
> > +
> > +/*
> > + * Globally enable and disable KCSAN.
> > + */
> > +extern bool kcsan_enabled;
> > +
> > +/*
> > + * Helper that returns true if access to ptr should be considered as an atomic
> > + * access, even though it is not explicitly atomic.
> > + */
> > +bool kcsan_is_atomic(const volatile void *ptr);
> > +
> > +/*
> > + * Initialize debugfs file.
> > + */
> > +void kcsan_debugfs_init(void);
> > +
> > +enum kcsan_counter_id {
> Labels in enums should be capitalized:
> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#macros-enums-and-rtl

Done @ v3.

> > +       /*
> > +        * Number of watchpoints currently in use.
> > +        */
> > +       kcsan_counter_used_watchpoints,
> > +
> > +       /*
> > +        * Total number of watchpoints set up.
> > +        */
> > +       kcsan_counter_setup_watchpoints,
> > +
> > +       /*
> > +        * Total number of data-races.
> > +        */
> > +       kcsan_counter_data_races,
> > +
> > +       /*
> > +        * Number of times no watchpoints were available.
> > +        */
> > +       kcsan_counter_no_capacity,
> > +
> > +       /*
> > +        * A thread checking a watchpoint raced with another checking thread;
> > +        * only one will be reported.
> > +        */
> > +       kcsan_counter_report_races,
> > +
> > +       /*
> > +        * Observed data value change, but writer thread unknown.
> > +        */
> > +       kcsan_counter_races_unknown_origin,
> > +
> > +       /*
> > +        * The access cannot be encoded to a valid watchpoint.
> > +        */
> > +       kcsan_counter_unencodable_accesses,
> > +
> > +       /*
> > +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> > +        * accesses.
> > +        */
> > +       kcsan_counter_encoding_false_positives,
> > +
> > +       kcsan_counter_count, /* number of counters */
> > +};
> > +
> > +/*
> > + * Increment/decrement counter with given id; avoid calling these in fast-path.
> > + */
> > +void kcsan_counter_inc(enum kcsan_counter_id id);
> > +void kcsan_counter_dec(enum kcsan_counter_id id);
> > +
> > +/*
> > + * Returns true if data-races in the function symbol that maps to addr (offsets
> > + * are ignored) should *not* be reported.
> > + */
> > +bool kcsan_skip_report(unsigned long func_addr);
> > +
> > +enum kcsan_report_type {
> > +       /*
> > +        * The thread that set up the watchpoint and briefly stalled was
> > +        * signalled that another thread triggered the watchpoint, and thus a
> > +        * race was encountered.
> > +        */
> > +       kcsan_report_race_setup,
> > +
> > +       /*
> > +        * A thread encountered a watchpoint for the access, therefore a race
> > +        * was encountered.
> > +        */
> > +       kcsan_report_race_check,
> > +
> > +       /*
> > +        * A thread encountered a watchpoint for the access, but the other
> > +        * racing thread can no longer be signaled that a race occurred.
> > +        */
> > +       kcsan_report_race_check_race,
> > +
> > +       /*
> > +        * No other thread was observed to race with the access, but the data
> > +        * value before and after the stall differs.
> > +        */
> > +       kcsan_report_race_unknown_origin,
> > +};
> > +/*
> > + * Print a race report from thread that encountered the race.
> > + */
> > +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> > +                 int cpu_id, enum kcsan_report_type type);
> > +
> > +#endif /* _MM_KCSAN_KCSAN_H */
> > diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> > new file mode 100644
> > index 000000000000..517db539e4e7
> > --- /dev/null
> > +++ b/kernel/kcsan/report.c
> > @@ -0,0 +1,306 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/preempt.h>
> > +#include <linux/printk.h>
> > +#include <linux/sched.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/stacktrace.h>
> > +
> > +#include "kcsan.h"
> > +#include "encoding.h"
> > +
> > +/*
> > + * Max. number of stack entries to show in the report.
> > + */
> > +#define NUM_STACK_ENTRIES 16
> > +
> > +/*
> > + * Other thread info: communicated from other racing thread to thread that set
> > + * up the watchpoint, which then prints the complete report atomically. Only
> > + * need one struct, as all threads should to be serialized regardless to print
> > + * the reports, with reporting being in the slow-path.
> > + */
> > +static struct {
> > +       const volatile void *ptr;
> > +       size_t size;
> > +       bool is_write;
> > +       int task_pid;
> > +       int cpu_id;
> > +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> > +       int num_stack_entries;
> > +} other_info = { .ptr = NULL };
> > +
> > +static DEFINE_SPINLOCK(other_info_lock);
> > +static DEFINE_SPINLOCK(report_lock);
> > +
> > +static bool set_or_lock_other_info(unsigned long *flags,
> > +                                  const volatile void *ptr, size_t size,
> > +                                  bool is_write, int cpu_id,
> > +                                  enum kcsan_report_type type)
> > +{
> > +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> > +               return true;
> > +
> > +       for (;;) {
> > +               spin_lock_irqsave(&other_info_lock, *flags);
> > +
> > +               switch (type) {
> > +               case kcsan_report_race_check:
> > +                       if (other_info.ptr != NULL) {
> > +                               /* still in use, retry */
> > +                               break;
> > +                       }
> > +                       other_info.ptr = ptr;
> > +                       other_info.size = size;
> > +                       other_info.is_write = is_write;
> > +                       other_info.task_pid =
> > +                               in_task() ? task_pid_nr(current) : -1;
> > +                       other_info.cpu_id = cpu_id;
> > +                       other_info.num_stack_entries = stack_trace_save(
> > +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> > +                       /*
> > +                        * other_info may now be consumed by thread we raced
> > +                        * with.
> > +                        */
> > +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> > +                       return false;
> > +
> > +               case kcsan_report_race_setup:
> > +                       if (other_info.ptr == NULL)
> > +                               break; /* no data available yet, retry */
> > +
> > +                       /*
> > +                        * First check if matching based on how watchpoint was
> > +                        * encoded.
> > +                        */
> > +                       if (!matching_access((unsigned long)other_info.ptr &
> > +                                                    WATCHPOINT_ADDR_MASK,
> > +                                            other_info.size,
> > +                                            (unsigned long)ptr &
> > +                                                    WATCHPOINT_ADDR_MASK,
> > +                                            size))
> > +                               break; /* mismatching access, retry */
> > +
> > +                       if (!matching_access((unsigned long)other_info.ptr,
> > +                                            other_info.size,
> > +                                            (unsigned long)ptr, size)) {
> > +                               /*
> > +                                * If the actual accesses to not match, this was
> > +                                * a false positive due to watchpoint encoding.
> > +                                */
> > +                               other_info.ptr = NULL; /* mark for reuse */
> > +                               kcsan_counter_inc(
> > +                                       kcsan_counter_encoding_false_positives);
> > +                               spin_unlock_irqrestore(&other_info_lock,
> > +                                                      *flags);
> > +                               return false;
> > +                       }
> > +
> > +                       /*
> > +                        * Matching access: keep other_info locked, as this
> > +                        * thread uses it to print the full report; unlocked in
> > +                        * end_report.
> > +                        */
> > +                       return true;
> > +
> > +               default:
> > +                       BUG();
> > +               }
> > +
> > +               spin_unlock_irqrestore(&other_info_lock, *flags);
> > +       }
> > +}
> > +
> > +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> > +{
> > +       switch (type) {
> > +       case kcsan_report_race_setup:
> > +               /* irqsaved already via other_info_lock */
> > +               spin_lock(&report_lock);
> > +               break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               spin_lock_irqsave(&report_lock, *flags);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +}
> > +
> > +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> > +{
> > +       switch (type) {
> > +       case kcsan_report_race_setup:
> > +               other_info.ptr = NULL; /* mark for reuse */
> > +               spin_unlock(&report_lock);
> > +               spin_unlock_irqrestore(&other_info_lock, *flags);
> > +               break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               spin_unlock_irqrestore(&report_lock, *flags);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +}
> > +
> > +static const char *get_access_type(bool is_write)
> > +{
> > +       return is_write ? "write" : "read";
> > +}
> > +
> > +/* Return thread description: in task or interrupt. */
> > +static const char *get_thread_desc(int task_id)
> > +{
> > +       if (task_id != -1) {
> > +               static char buf[32]; /* safe: protected by report_lock */
> > +
> > +               snprintf(buf, sizeof(buf), "task %i", task_id);
> > +               return buf;
> > +       }
> > +       return in_nmi() ? "NMI" : "interrupt";
> > +}
> > +
> > +/* Helper to skip KCSAN-related functions in stack-trace. */
> > +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> > +{
> > +       char buf[64];
> > +       int skip = 0;
> > +
> > +       for (; skip < num_entries; ++skip) {
> > +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> > +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> > +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> > +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> > +                       break;
> > +               }
> > +       }
> > +       return skip;
> > +}
> FWIW another option is to put all KCSAN-related functions in a
> separate code section and check if the function addresses are in the
> address range belonging to that section.
> This will work even with non-symbolized stacks.

Thanks for the suggestion. Is it worth it, i.e. will it simplify the
design and code? If it simplifies the design (or makes the fast-path
significantly faster), then yes, but otherwise I prefer the simplest
possible solution here. AFAIK, it will not make it simpler nor faster.
Using non-symbolized stacks should not be the common use-case (how to
usefully debug any data-race?).

> > +/* Compares symbolized strings of addr1 and addr2. */
> > +static int sym_strcmp(void *addr1, void *addr2)
> > +{
> > +       char buf1[64];
> > +       char buf2[64];
> > +
> > +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> > +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> > +       return strncmp(buf1, buf2, sizeof(buf1));
> > +}
> > +
> > +/*
> > + * Returns true if a report was generated, false otherwise.
> > + */
> > +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> > +                         int cpu_id, enum kcsan_report_type type)
> > +{
> > +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> > +       int num_stack_entries =
> > +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> > +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> > +       int other_skipnr;
> > +
> > +       /* Check if the top stackframe is in a blacklisted function. */
> > +       if (kcsan_skip_report(stack_entries[skipnr]))
> > +               return false;
> > +       if (type == kcsan_report_race_setup) {
> > +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> > +                                               other_info.num_stack_entries);
> > +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> > +                       return false;
> > +       }
> > +
> > +       /* Print report header. */
> > +       pr_err("==================================================================\n");
> > +       switch (type) {
> > +       case kcsan_report_race_setup: {
> > +               void *this_fn = (void *)stack_entries[skipnr];
> > +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> > +               int cmp;
> > +
> > +               /*
> > +                * Order functions lexographically for consistent bug titles.
> > +                * Do not print offset of functions to keep title short.
> > +                */
> > +               cmp = sym_strcmp(other_fn, this_fn);
> > +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> > +                      cmp < 0 ? other_fn : this_fn,
> > +                      cmp < 0 ? this_fn : other_fn);
> > +       } break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               pr_err("BUG: KCSAN: data-race in %pS\n",
> > +                      (void *)stack_entries[skipnr]);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +
> > +       pr_err("\n");
> > +
> > +       /* Print information about the racing accesses. */
> > +       switch (type) {
> > +       case kcsan_report_race_setup:
> > +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> > +                      get_access_type(other_info.is_write), other_info.ptr,
> > +                      other_info.size, get_thread_desc(other_info.task_pid),
> > +                      other_info.cpu_id);
> > +
> > +               /* Print the other thread's stack trace. */
> > +               stack_trace_print(other_info.stack_entries + other_skipnr,
> > +                                 other_info.num_stack_entries - other_skipnr,
> > +                                 0);
> > +
> > +               pr_err("\n");
> > +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> > +                      get_access_type(is_write), ptr, size,
> > +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> > +                      cpu_id);
> > +               break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> > +                      get_access_type(is_write), ptr, size,
> > +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> > +                      cpu_id);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +       /* Print stack trace of this thread. */
> > +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> > +                         0);
> > +
> > +       /* Print report footer. */
> > +       pr_err("\n");
> > +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> > +       dump_stack_print_info(KERN_DEFAULT);
> > +       pr_err("==================================================================\n");
> > +
> > +       return true;
> > +}
> > +
> > +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> > +                 int cpu_id, enum kcsan_report_type type)
> > +{
> > +       unsigned long flags = 0;
> > +
> > +       if (type == kcsan_report_race_check_race)
> > +               return;
> > +
> > +       kcsan_disable_current();
> > +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> > +               start_report(&flags, type);
> > +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> > +                   panic_on_warn)
> > +                       panic("panic_on_warn set ...\n");
> > +
> > +               end_report(&flags, type);
> > +       }
> > +       kcsan_enable_current();
> > +}
> > diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> > new file mode 100644
> > index 000000000000..68c896a24529
> > --- /dev/null
> > +++ b/kernel/kcsan/test.c
> > @@ -0,0 +1,117 @@
> > +// SPDX-License-Identifier: GPL-2.0
> IIRC checkpatch.pl requires all SPDX headers to look like this one
> (C++-style, not C-style).
> Please double check and fix the headers in other files if necessary.

Checkpatch is happy. // for .c, and /**/ for .h.

> This file might also use some comments, now it's not easy to
> understand what it's testing.

Done @ v3.

> > +
> > +#include <linux/init.h>
> > +#include <linux/kernel.h>
> > +#include <linux/printk.h>
> > +#include <linux/random.h>
> > +#include <linux/types.h>
> > +
> > +#include "encoding.h"
> > +
> > +#define ITERS_PER_TEST 2000
> > +
> > +/* Test requirements. */
> > +static bool test_requires(void)
> > +{
> > +       /* random should be initialized */
> > +       return prandom_u32() + prandom_u32() != 0;
> > +}
> > +
> > +/* Test watchpoint encode and decode. */
> > +static bool test_encode_decode(void)
> > +{
> > +       int i;
> > +
> > +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> > +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> > +               bool is_write = prandom_u32() % 2;
> > +               unsigned long addr;
> > +
> > +               prandom_bytes(&addr, sizeof(addr));
> > +               if (WARN_ON(!check_encodable(addr, size)))
> > +                       return false;
> > +
> > +               /* encode and decode */
> > +               {
> > +                       const long encoded_watchpoint =
> > +                               encode_watchpoint(addr, size, is_write);
> > +                       unsigned long verif_masked_addr;
> > +                       size_t verif_size;
> > +                       bool verif_is_write;
> > +
> > +                       /* check special watchpoints */
> > +                       if (WARN_ON(decode_watchpoint(
> > +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> > +                                   &verif_size, &verif_is_write)))
> > +                               return false;
> > +                       if (WARN_ON(decode_watchpoint(
> > +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> > +                                   &verif_size, &verif_is_write)))
> > +                               return false;
> > +
> > +                       /* check decoding watchpoint returns same data */
> > +                       if (WARN_ON(!decode_watchpoint(
> > +                                   encoded_watchpoint, &verif_masked_addr,
> > +                                   &verif_size, &verif_is_write)))
> > +                               return false;
> > +                       if (WARN_ON(verif_masked_addr !=
> > +                                   (addr & WATCHPOINT_ADDR_MASK)))
> > +                               goto fail;
> > +                       if (WARN_ON(verif_size != size))
> > +                               goto fail;
> > +                       if (WARN_ON(is_write != verif_is_write))
> > +                               goto fail;
> > +
> > +                       continue;
> > +fail:
> > +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> > +                              __func__, is_write ? "write" : "read", size,
> > +                              addr, encoded_watchpoint,
> > +                              verif_is_write ? "write" : "read", verif_size,
> > +                              verif_masked_addr);
> > +                       return false;
> > +               }
> > +       }
> > +
> > +       return true;
> > +}
> > +
> > +static bool test_matching_access(void)
> > +{
> > +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> > +               return false;
> > +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> > +               return false;
> > +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> > +               return false;
> > +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> > +               return false;
> > +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> > +               return false;
> > +       return true;
> > +}
> > +
> > +static int __init kcsan_selftest(void)
> > +{
> > +       int passed = 0;
> > +       int total = 0;
> > +
> > +#define RUN_TEST(do_test)                                                      \
> > +       do {                                                                   \
> > +               ++total;                                                       \
> > +               if (do_test())                                                 \
> > +                       ++passed;                                              \
> > +               else                                                           \
> > +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> > +       } while (0)
> > +
> > +       RUN_TEST(test_requires);
> > +       RUN_TEST(test_encode_decode);
> > +       RUN_TEST(test_matching_access);
> > +
> > +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> > +       if (passed != total)
> > +               panic("KCSAN selftests failed");
> > +       return 0;
> > +}
> > +postcore_initcall(kcsan_selftest);
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 93d97f9b0157..35accd1d93de 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
> >
> >  source "lib/Kconfig.ubsan"
> >
> > +source "lib/Kconfig.kcsan"
> > +
> >  config ARCH_HAS_DEVMEM_IS_ALLOWED
> >         bool
> >
> > diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> > new file mode 100644
> > index 000000000000..3e1f1acfb24b
> > --- /dev/null
> > +++ b/lib/Kconfig.kcsan
> > @@ -0,0 +1,88 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +config HAVE_ARCH_KCSAN
> > +       bool
> > +
> > +menuconfig KCSAN
> > +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> > +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> > +       default n
> > +       help
> > +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> > +         uses a watchpoint-based sampling approach to detect races.
> > +
> > +if KCSAN
> > +
> > +config KCSAN_SELFTEST
> > +       bool "KCSAN: perform short selftests on boot"
> > +       default y
> > +       help
> > +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> > +
> > +config KCSAN_EARLY_ENABLE
> > +       bool "KCSAN: early enable"
> > +       default y
> > +       help
> > +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> > +         later be enabled/disabled via debugfs.
> > +
> > +config KCSAN_UDELAY_MAX_TASK
> > +       int "KCSAN: maximum delay in microseconds (for tasks)"
> > +       default 80
> > +       help
> > +         For tasks, the max. microsecond delay after setting up a watchpoint.
> > +
> > +config KCSAN_UDELAY_MAX_INTERRUPT
> > +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> > +       default 20
> > +       help
> > +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> > +
> > +config KCSAN_DELAY_RANDOMIZE
> > +       bool "KCSAN: randomize delays"
> > +       default y
> > +       help
> > +         If delays should be randomized; if false, the chosen delay is simply
> > +         the maximum values defined above.
> > +
> > +config KCSAN_WATCH_SKIP_INST
> > +       int "KCSAN: watchpoint instruction skip"
> > +       default 2000
> > +       help
> > +         The number of per-CPU memory operations to skip watching, before
> > +         another watchpoint is set up; in other words, 1 in
> > +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> > +         watchpoint. A smaller value results in more aggressive race
> > +         detection, whereas a larger value improves system performance at the
> > +         cost of missing some races.
> > +
> > +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> > +       bool "KCSAN: report races of unknown origin"
> > +       default y
> > +       help
> > +         If KCSAN should report races where only one access is known, and the
> > +         conflicting access is of unknown origin. This type of race is
> > +         reported if it was only possible to infer a race due to a data-value
> > +         change while an access is being delayed on a watchpoint.
> > +
> > +config KCSAN_IGNORE_ATOMICS
> > +       bool "KCSAN: do not instrument marked atomic accesses"
> > +       default n
> > +       help
> > +         If enabled, never instruments marked atomic accesses. This results in
> > +         not reporting data-races where one access is atomic and the other is
> > +         a plain access.
> > +
> Isn't it better to decide at runtime, whether we want to ignore atomics or not?

See below.

> > +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> > +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> > +       default n
> > +       help
> > +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> > +         This option should only be used to prune initial data-races found in
> > +         existing code.
> Overall, I think it's better to make most of these configs boot-time flags.
> This way one won't need to rebuild the kernel every time they want to
> turn some option on or off.

From a design point of view, this complicates things on several
fronts. For one I would prefer having config options in one place,
however, most of these were added to "tame" syzbot, and keep reporting
volume initially low. I do not expect these to be switched frequently,
and for simplicity sake and to optimize for the common use-case, it'll
be better to keep it as-is. Eventually, these might even go away
completely.

I will add a comment to that effect above these options for v3.

> > +config KCSAN_DEBUG
> > +       bool "Debugging of KCSAN internals"
> > +       default n
> > +
> > +endif # KCSAN
> > diff --git a/lib/Makefile b/lib/Makefile
> > index c5892807e06f..778ab704e3ad 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
> >  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
> >  endif
> >
> > +# Used by KCSAN while enabled, avoid recursion.
> > +KCSAN_SANITIZE_random32.o := n
> > +
> >  lib-y := ctype.o string.o vsprintf.o cmdline.o \
> >          rbtree.o radix-tree.o timerqueue.o xarray.o \
> >          idr.o extable.o \
> > diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> > new file mode 100644
> > index 000000000000..caf1111a28ae
> > --- /dev/null
> > +++ b/scripts/Makefile.kcsan
> > @@ -0,0 +1,6 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +ifdef CONFIG_KCSAN
> > +
> > +CFLAGS_KCSAN := -fsanitize=thread
> > +
> > +endif # CONFIG_KCSAN
> > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> > index 179d55af5852..0e78abab7d83 100644
> > --- a/scripts/Makefile.lib
> > +++ b/scripts/Makefile.lib
> > @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
> >         $(CFLAGS_KCOV))
> >  endif
> >
> > +#
> > +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> "KernelConcurrencySanitizer" or "Kernel Concurrency Sanitizer", maybe?

Done @ v3.

> > +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> > +#
> > +ifeq ($(CONFIG_KCSAN),y)
> > +_c_flags += $(if $(patsubst n%,, \
> > +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> > +       $(CFLAGS_KCSAN))
> > +endif
> > +
> >  # $(srctree)/$(src) for including checkin headers from generated source files
> >  # $(objtree)/$(obj) for including generated headers from checkin source files
> >  ifeq ($(KBUILD_EXTMOD),)
> > --
> > 2.23.0.866.gb869b98d4c-goog
> >

Thanks for your comments!
-- Marco

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-21 15:54       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-21 15:54 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern, Andrea Parri,
	Andrey Konovalov, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Boqun Feng, Borislav Petkov, Daniel Axtens, Daniel Lustig,
	dave.hansen, David Howells, Dmitriy Vyukov, H. Peter Anvin,
	Ingo Molnar, Jade Alglave, Joel Fernandes, Jonathan Corbet,
	Josh Poimboeuf, Luc Maranget, Mark Rutland, Nicholas Piggin,
	Paul McKenney, Peter Zijlstra, Thomas Gleixner, Will Deacon,
	kasan-dev, linux-arch, open list:DOCUMENTATION, linux-efi,
	Linux Kbuild mailing list, LKML, Linux Memory Management List,
	the arch/x86 maintainers

On Mon, 21 Oct 2019 at 15:37, Alexander Potapenko <glider@google.com> wrote:
>
> On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
> >
> > Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> > kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> > See the included Documentation/dev-tools/kcsan.rst for more details.
> >
> > This patch adds basic infrastructure, but does not yet enable KCSAN for
> > any architecture.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v2:
> > * Elaborate comment about instrumentation calls emitted by compilers.
> > * Replace kcsan_check_access(.., {true, false}) with
> >   kcsan_check_{read,write} for improved readability.
> > * Change bug title of race of unknown origin to just say "data-race in".
> > * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> > * Add comment about safety of find_watchpoint without user_access_save.
> > * Remove unnecessary preempt_disable/enable and elaborate on comment why
> >   we want to disable interrupts and preemptions.
> > * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
> >   contexts [Suggested by Mark Rutland].
> > ---
> >  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
> >  MAINTAINERS                       |  11 +
> >  Makefile                          |   3 +-
> >  include/linux/compiler-clang.h    |   9 +
> >  include/linux/compiler-gcc.h      |   7 +
> >  include/linux/compiler.h          |  35 ++-
> >  include/linux/kcsan-checks.h      | 147 ++++++++++
> >  include/linux/kcsan.h             | 108 ++++++++
> >  include/linux/sched.h             |   4 +
> >  init/init_task.c                  |   8 +
> >  init/main.c                       |   2 +
> >  kernel/Makefile                   |   1 +
> >  kernel/kcsan/Makefile             |  14 +
> >  kernel/kcsan/atomic.c             |  21 ++
> >  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
> >  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
> >  kernel/kcsan/encoding.h           |  94 +++++++
> >  kernel/kcsan/kcsan.c              |  86 ++++++
> >  kernel/kcsan/kcsan.h              | 140 ++++++++++
> >  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
> >  kernel/kcsan/test.c               | 117 ++++++++
> >  lib/Kconfig.debug                 |   2 +
> >  lib/Kconfig.kcsan                 |  88 ++++++
> >  lib/Makefile                      |   3 +
> >  scripts/Makefile.kcsan            |   6 +
> >  scripts/Makefile.lib              |  10 +
> >  26 files changed, 2069 insertions(+), 9 deletions(-)
> >  create mode 100644 Documentation/dev-tools/kcsan.rst
> >  create mode 100644 include/linux/kcsan-checks.h
> >  create mode 100644 include/linux/kcsan.h
> >  create mode 100644 kernel/kcsan/Makefile
> >  create mode 100644 kernel/kcsan/atomic.c
> >  create mode 100644 kernel/kcsan/core.c
> >  create mode 100644 kernel/kcsan/debugfs.c
> >  create mode 100644 kernel/kcsan/encoding.h
> >  create mode 100644 kernel/kcsan/kcsan.c
> >  create mode 100644 kernel/kcsan/kcsan.h
> >  create mode 100644 kernel/kcsan/report.c
> >  create mode 100644 kernel/kcsan/test.c
> >  create mode 100644 lib/Kconfig.kcsan
> >  create mode 100644 scripts/Makefile.kcsan
> >
> > diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> > new file mode 100644
> > index 000000000000..497b09e5cc96
> > --- /dev/null
> > +++ b/Documentation/dev-tools/kcsan.rst
> > @@ -0,0 +1,203 @@
> > +The Kernel Concurrency Sanitizer (KCSAN)
> > +========================================
> > +
> > +Overview
> > +--------
> > +
> > +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> > +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> > +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> > +detector. Key priorities in KCSAN's design are lack of false positives,
> > +scalability, and simplicity. More details can be found in `Implementation
> > +Details`_.
> > +
> > +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> > +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> > +With Clang it requires version 7.0.0 or later.
> > +
> > +Usage
> > +-----
> > +
> > +To enable KCSAN configure kernel with::
> > +
> > +    CONFIG_KCSAN = y
> > +
> > +KCSAN provides several other configuration options to customize behaviour (see
> > +their respective help text for more info).
> > +
> > +debugfs
> > +~~~~~~~
> > +
> > +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> > +
> > +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> > +  ``/sys/kernel/debug/kcsan``.
> > +
> > +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> > +  ``some_func_name`` to the report filter list, which (by default) blacklists
> > +  reporting data-races where either one of the top stackframes are a function
> > +  in the list.
> > +
> > +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> > +  changes the report filtering behaviour. For example, the blacklist feature
> > +  can be used to silence frequently occurring data-races; the whitelist feature
> > +  can help with reproduction and testing of fixes.
> > +
> > +Error reports
> > +~~~~~~~~~~~~~
> > +
> > +A typical data-race report looks like this::
> > +
> > +    ==================================================================
> > +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> > +
> > +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> > +     kernfs_refresh_inode+0x70/0x170
> > +     kernfs_iop_permission+0x4f/0x90
> > +     inode_permission+0x190/0x200
> > +     link_path_walk.part.0+0x503/0x8e0
> > +     path_lookupat.isra.0+0x69/0x4d0
> > +     filename_lookup+0x136/0x280
> > +     user_path_at_empty+0x47/0x60
> > +     vfs_statx+0x9b/0x130
> > +     __do_sys_newlstat+0x50/0xb0
> > +     __x64_sys_newlstat+0x37/0x50
> > +     do_syscall_64+0x85/0x260
> > +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > +
> > +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> > +     generic_permission+0x5b/0x2a0
> > +     kernfs_iop_permission+0x66/0x90
> > +     inode_permission+0x190/0x200
> > +     link_path_walk.part.0+0x503/0x8e0
> > +     path_lookupat.isra.0+0x69/0x4d0
> > +     filename_lookup+0x136/0x280
> > +     user_path_at_empty+0x47/0x60
> > +     do_faccessat+0x11a/0x390
> > +     __x64_sys_access+0x3c/0x50
> > +     do_syscall_64+0x85/0x260
> > +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > +
> > +    Reported by Kernel Concurrency Sanitizer on:
> > +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +The header of the report provides a short summary of the functions involved in
> > +the race. It is followed by the access types and stack traces of the 2 threads
> > +involved in the data-race.
> > +
> > +The other less common type of data-race report looks like this::
> > +
> > +    ==================================================================
> > +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> > +
> > +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> > +     e1000_clean_rx_irq+0x551/0xb10
> > +     e1000_clean+0x533/0xda0
> > +     net_rx_action+0x329/0x900
> > +     __do_softirq+0xdb/0x2db
> > +     irq_exit+0x9b/0xa0
> > +     do_IRQ+0x9c/0xf0
> > +     ret_from_intr+0x0/0x18
> > +     default_idle+0x3f/0x220
> > +     arch_cpu_idle+0x21/0x30
> > +     do_idle+0x1df/0x230
> > +     cpu_startup_entry+0x14/0x20
> > +     rest_init+0xc5/0xcb
> > +     arch_call_rest_init+0x13/0x2b
> > +     start_kernel+0x6db/0x700
> > +
> > +    Reported by Kernel Concurrency Sanitizer on:
> > +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +This report is generated where it was not possible to determine the other
> > +racing thread, but a race was inferred due to the data-value of the watched
> > +memory location having changed. These can occur either due to missing
> > +instrumentation or e.g. DMA accesses.
> > +
> > +Data-Races
> > +----------
> Nit: I was under the impression "data races" were commonly written
> without a hyphen. I may be mistaken.

Thanks. I've updated it everywhere except in bug titles, which should
remain as-is.

> > +
> > +Informally, two operations *conflict* if they access the same memory location,
> > +and at least one of them is a write operation. In an execution, two memory
> > +operations from different threads form a **data-race** if they *conflict*, at
> > +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> > +the "happens-before" order according to the `LKMM
> > +<../../tools/memory-model/Documentation/explanation.txt>`_.
> > +
> > +Relationship with the Linux Kernel Memory Model (LKMM)
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +The LKMM defines the propagation and ordering rules of various memory
> > +operations, which gives developers the ability to reason about concurrent code.
> > +Ultimately this allows to determine the possible executions of concurrent code,
> > +and if that code is free from data-races.
> > +
> > +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> > +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> > +words, KCSAN assumes that as long as a plain access is not observed to race
> > +with another conflicting access, memory operations are correctly ordered.
> > +
> > +This means that KCSAN will not report *potential* data-races due to missing
> > +memory ordering. If, however, missing memory ordering (that is observable with
> > +a particular compiler and architecture) leads to an observable data-race (e.g.
> > +entering a critical section erroneously), KCSAN would report the resulting
> > +data-race.
> > +
> > +Implementation Details
> > +----------------------
> > +
> > +The general approach is inspired by `DataCollider
> > +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> > +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> > +relies on compiler instrumentation. Watchpoints are implemented using an
> > +efficient encoding that stores access type, size, and address in a long; the
> > +benefits of using "soft watchpoints" are portability and greater flexibility in
> > +limiting which accesses trigger a watchpoint.
> > +
> > +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> > +memory operations; for each instrumented plain access:
> > +
> > +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> > +   write, then we encountered a racing access.
> > +
> > +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> > +   stall some delay.
> > +
> > +3. Also check the data value before the delay, and re-check the data value
> > +   after delay; if the values mismatch, we infer a race of unknown origin.
> > +
> > +To detect data-races between plain and atomic memory operations, KCSAN also
> > +annotates atomic accesses, but only to check if a watchpoint exists
> > +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> > +accesses.
> > +
> > +Key Properties
> > +~~~~~~~~~~~~~~
> > +
> > +1. **Memory Overhead:** No shadow memory is required. The current
> > +   implementation uses a small array of longs to encode watchpoint information,
> > +   which is negligible.
> > +
> > +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> > +   efficient watchpoint encoding that does not require acquiring any shared
> > +   locks in the fast-path. For kernel boot with a default config on a system
> > +   where nproc=8 we measure a slow-down of 10-15x.
> > +
> > +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> > +   may result in missed data-races (false negatives), compared to a
> > +   happens-before data-race detector.
> > +
> > +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> > +
> > +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> > +   runtime. With a happens-before data-race detector, any omission leads to
> > +   false positives, which is especially important in the context of the kernel
> > +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> > +   result, maintenance overheads are minimal as the kernel evolves.
> > +
> > +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> > +   setting up watchpoints, racy writes from devices can also be detected.
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 0154674cbad3..71f7fb625490 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
> >  F:     scripts/kconfig/
> >  F:     scripts/Kconfig.include
> >
> > +KCSAN
> > +M:     Marco Elver <elver@google.com>
> > +R:     Dmitry Vyukov <dvyukov@google.com>
> > +L:     kasan-dev@googlegroups.com
> > +S:     Maintained
> > +F:     Documentation/dev-tools/kcsan.rst
> > +F:     include/linux/kcsan*.h
> > +F:     kernel/kcsan/
> > +F:     lib/Kconfig.kcsan
> > +F:     scripts/Makefile.kcsan
> > +
> >  KDUMP
> >  M:     Dave Young <dyoung@redhat.com>
> >  M:     Baoquan He <bhe@redhat.com>
> > diff --git a/Makefile b/Makefile
> > index ffd7a912fc46..ad4729176252 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
> >
> >  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
> >  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> > -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> > +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
> >  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
> >  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
> >  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> > @@ -900,6 +900,7 @@ endif
> >  include scripts/Makefile.kasan
> >  include scripts/Makefile.extrawarn
> >  include scripts/Makefile.ubsan
> > +include scripts/Makefile.kcsan
> >
> >  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
> >  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> > diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> > index 333a6695a918..a213eb55e725 100644
> > --- a/include/linux/compiler-clang.h
> > +++ b/include/linux/compiler-clang.h
> > @@ -24,6 +24,15 @@
> >  #define __no_sanitize_address
> >  #endif
> >
> > +#if __has_feature(thread_sanitizer)
> > +/* emulate gcc's __SANITIZE_THREAD__ flag */
> > +#define __SANITIZE_THREAD__
> > +#define __no_sanitize_thread \
> > +               __attribute__((no_sanitize("thread")))
> > +#else
> > +#define __no_sanitize_thread
> > +#endif
> > +
> >  /*
> >   * Not all versions of clang implement the the type-generic versions
> >   * of the builtin overflow checkers. Fortunately, clang implements
> > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> > index d7ee4c6bad48..de105ca29282 100644
> > --- a/include/linux/compiler-gcc.h
> > +++ b/include/linux/compiler-gcc.h
> > @@ -145,6 +145,13 @@
> >  #define __no_sanitize_address
> >  #endif
> >
> > +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> > +#define __no_sanitize_thread                                                   \
> > +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> > +#else
> > +#define __no_sanitize_thread
> > +#endif
> > +
> >  #if GCC_VERSION >= 50100
> >  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
> >  #endif
> > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > index 5e88e7e33abe..350d80dbee4d 100644
> > --- a/include/linux/compiler.h
> > +++ b/include/linux/compiler.h
> > @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> >  #endif
> >
> >  #include <uapi/linux/types.h>
> > +#include <linux/kcsan-checks.h>
> >
> >  #define __READ_ONCE_SIZE                                               \
> >  ({                                                                     \
> > @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> >         }                                                               \
> >  })
> >
> > -static __always_inline
> > -void __read_once_size(const volatile void *p, void *res, int size)
> > -{
> > -       __READ_ONCE_SIZE;
> > -}
> > -
> >  #ifdef CONFIG_KASAN
> >  /*
> >   * We can't declare function 'inline' because __no_sanitize_address confilcts
> > @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
> >  # define __no_kasan_or_inline __always_inline
> >  #endif
> >
> > -static __no_kasan_or_inline
> > +#ifdef CONFIG_KCSAN
> > +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> > +#else
> > +# define __no_kcsan_or_inline __always_inline
> > +#endif
> > +
> > +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> > +/* Avoid any instrumentation or inline. */
> > +#define __no_sanitize_or_inline                                                \
> > +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> > +#else
> > +#define __no_sanitize_or_inline __always_inline
> > +#endif
> > +
> > +static __no_kcsan_or_inline
> > +void __read_once_size(const volatile void *p, void *res, int size)
> > +{
> > +       kcsan_check_atomic_read((const void *)p, size);
> > +       __READ_ONCE_SIZE;
> > +}
> > +
> > +static __no_sanitize_or_inline
> >  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
> >  {
> >         __READ_ONCE_SIZE;
> >  }
> >
> > -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> > +static __no_kcsan_or_inline
> > +void __write_once_size(volatile void *p, void *res, int size)
> >  {
> > +       kcsan_check_atomic_write((const void *)p, size);
> > +
> >         switch (size) {
> >         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
> >         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> > diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> > new file mode 100644
> > index 000000000000..4203603ae852
> > --- /dev/null
> > +++ b/include/linux/kcsan-checks.h
> > @@ -0,0 +1,147 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _LINUX_KCSAN_CHECKS_H
> > +#define _LINUX_KCSAN_CHECKS_H
> > +
> > +#include <linux/types.h>
> > +
> > +/*
> > + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> > + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> > + * to validate access to an address.   Never use these in header files!
> > + */
> > +#ifdef CONFIG_KCSAN
> > +/**
> > + * __kcsan_check_watchpoint - check if a watchpoint exists
> > + *
> > + * Returns true if no race was detected, and we may then proceed to set up a
> > + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> > + * encountered, and we may not set up a watchpoint after.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + * @is_write is access a write
> > + * @return true if no race was detected, false otherwise.
> > + */
> > +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write);
> I think the parameter indentations are a bit off here and below (I've
> also looked at the Github diff);
> have you considered running checkpatch.pl?

It was formatted with clang-format, it's correct with 8 space tabs.
checkpath.pl is happy.

> > +
> > +/**
> > + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> > + *
> > + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> > + * reports the data-race.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + * @is_write is access a write
> > + */
> > +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write);
> > +#else
> > +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> > +                                           size_t size, bool is_write)
> > +{
> > +       return true;
> > +}
> > +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> > +                                           size_t size, bool is_write)
> > +{
> > +}
> > +#endif
> > +
> > +/*
> > + * kcsan_*: Only available when the particular compilation unit has KCSAN
> > + * instrumentation enabled. May be used in header files.
> > + */
> > +#ifdef __SANITIZE_THREAD__
> > +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> > +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> > +#else
> > +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> > +                                         bool is_write)
> > +{
> > +       return true;
> > +}
> > +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> > +                                         bool is_write)
> > +{
> > +}
> > +#endif
> > +
> > +/**
> > + * __kcsan_check_read - check regular read access for data-races
> > + *
> > + * Full read access that checks watchpoint and sets up a watchpoint if this
> > + * access is sampled. Note that, setting up watchpoints for plain reads is
> > + * required to also detect data-races with atomic accesses.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define __kcsan_check_read(ptr, size)                                          \
> > +       do {                                                                   \
> > +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> > +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> > +       } while (0)
> > +
> > +/**
> > + * __kcsan_check_write - check regular write access for data-races
> > + *
> > + * Full write access that checks watchpoint and sets up a watchpoint if this
> > + * access is sampled.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define __kcsan_check_write(ptr, size)                                         \
> > +       do {                                                                   \
> > +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> > +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> > +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> > +       } while (0)
> > +
> > +/**
> > + * kcsan_check_read - check regular read access for data-races
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define kcsan_check_read(ptr, size)                                            \
> > +       do {                                                                   \
> > +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> > +                       kcsan_setup_watchpoint(ptr, size, false);              \
> > +       } while (0)
> > +
> > +/**
> > + * kcsan_check_write - check regular write access for data-races
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define kcsan_check_write(ptr, size)                                           \
> > +       do {                                                                   \
> > +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> > +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> > +                       kcsan_setup_watchpoint(ptr, size, true);               \
> > +       } while (0)
> > +
> > +/*
> > + * Check for atomic accesses: if atomic access are not ignored, this simply
> > + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> > + */
> > +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> > +#define kcsan_check_atomic_read(...)                                           \
> > +       do {                                                                   \
> > +       } while (0)
> > +#define kcsan_check_atomic_write(...)                                          \
> > +       do {                                                                   \
> > +       } while (0)
> > +#else
> > +#define kcsan_check_atomic_read(ptr, size)                                     \
> > +       kcsan_check_watchpoint(ptr, size, false)
> > +#define kcsan_check_atomic_write(ptr, size)                                    \
> > +       kcsan_check_watchpoint(ptr, size, true)
> > +#endif
> > +
> > +#endif /* _LINUX_KCSAN_CHECKS_H */
> > diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> > new file mode 100644
> > index 000000000000..fd5de2ba3a16
> > --- /dev/null
> > +++ b/include/linux/kcsan.h
> > @@ -0,0 +1,108 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _LINUX_KCSAN_H
> > +#define _LINUX_KCSAN_H
> > +
> > +#include <linux/types.h>
> > +#include <linux/kcsan-checks.h>
> > +
> > +#ifdef CONFIG_KCSAN
> > +
> > +/*
> > + * Context for each thread of execution: for tasks, this is stored in
> > + * task_struct, and interrupts access internal per-CPU storage.
> > + */
> > +struct kcsan_ctx {
> > +       int disable; /* disable counter */
> > +       int atomic_next; /* number of following atomic ops */
> > +
> > +       /*
> > +        * We use separate variables to store if we are in a nestable or flat
> > +        * atomic region. This helps make sure that an atomic region with
> > +        * nesting support is not suddenly aborted when a flat region is
> > +        * contained within. Effectively this allows supporting nesting flat
> > +        * atomic regions within an outer nestable atomic region. Support for
> > +        * this is required as there are cases where a seqlock reader critical
> > +        * section (flat atomic region) is contained within a seqlock writer
> > +        * critical section (nestable atomic region), and the "mismatching
> > +        * kcsan_end_atomic()" warning would trigger otherwise.
> > +        */
> > +       int atomic_region;
> > +       bool atomic_region_flat;
> > +};
> > +
> > +/**
> > + * kcsan_init - initialize KCSAN runtime
> > + */
> > +void kcsan_init(void);
> > +
> > +/**
> > + * kcsan_disable_current - disable KCSAN for the current context
> > + *
> > + * Supports nesting.
> > + */
> > +void kcsan_disable_current(void);
> > +
> > +/**
> > + * kcsan_enable_current - re-enable KCSAN for the current context
> > + *
> > + * Supports nesting.
> > + */
> > +void kcsan_enable_current(void);
> > +
> > +/**
> > + * kcsan_begin_atomic - use to denote an atomic region
> > + *
> > + * Accesses within the atomic region may appear to race with other accesses but
> > + * should be considered atomic.
> > + *
> > + * @nest true if regions may be nested, or false for flat region
> > + */
> > +void kcsan_begin_atomic(bool nest);
> > +
> > +/**
> > + * kcsan_end_atomic - end atomic region
> > + *
> > + * @nest must match argument to kcsan_begin_atomic().
> > + */
> > +void kcsan_end_atomic(bool nest);
> > +
> > +/**
> > + * kcsan_atomic_next - consider following accesses as atomic
> > + *
> > + * Force treating the next n memory accesses for the current context as atomic
> > + * operations.
> > + *
> > + * @n number of following memory accesses to treat as atomic.
> > + */
> > +void kcsan_atomic_next(int n);
> > +
> > +#else /* CONFIG_KCSAN */
> > +
> > +static inline void kcsan_init(void)
> I think it should be ok to put {} on the same line with the function
> prototype here, see e.g. include/linux/kasan.h

Done @ v3.

> > +{
> > +}
> > +
> > +static inline void kcsan_disable_current(void)
> > +{
> > +}
> > +
> > +static inline void kcsan_enable_current(void)
> > +{
> > +}
> > +
> > +static inline void kcsan_begin_atomic(bool nest)
> > +{
> > +}
> > +
> > +static inline void kcsan_end_atomic(bool nest)
> > +{
> > +}
> > +
> > +static inline void kcsan_atomic_next(int n)
> > +{
> > +}
> > +
> > +#endif /* CONFIG_KCSAN */
> > +
> > +#endif /* _LINUX_KCSAN_H */
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 2c2e56bd8913..9490e417bf4a 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -31,6 +31,7 @@
> >  #include <linux/task_io_accounting.h>
> >  #include <linux/posix-timers.h>
> >  #include <linux/rseq.h>
> > +#include <linux/kcsan.h>
> >
> >  /* task_struct member predeclarations (sorted alphabetically): */
> >  struct audit_context;
> > @@ -1171,6 +1172,9 @@ struct task_struct {
> >  #ifdef CONFIG_KASAN
> >         unsigned int                    kasan_depth;
> >  #endif
> > +#ifdef CONFIG_KCSAN
> > +       struct kcsan_ctx                kcsan_ctx;
> > +#endif
> >
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >         /* Index of current stored address in ret_stack: */
> > diff --git a/init/init_task.c b/init/init_task.c
> > index 9e5cbe5eab7b..e229416c3314 100644
> > --- a/init/init_task.c
> > +++ b/init/init_task.c
> > @@ -161,6 +161,14 @@ struct task_struct init_task
> >  #ifdef CONFIG_KASAN
> >         .kasan_depth    = 1,
> >  #endif
> > +#ifdef CONFIG_KCSAN
> > +       .kcsan_ctx = {
> > +               .disable                = 1,
> > +               .atomic_next            = 0,
> > +               .atomic_region          = 0,
> > +               .atomic_region_flat     = 0,
> > +       },
> > +#endif
> >  #ifdef CONFIG_TRACE_IRQFLAGS
> >         .softirqs_enabled = 1,
> >  #endif
> > diff --git a/init/main.c b/init/main.c
> > index 91f6ebb30ef0..4d814de017ee 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -93,6 +93,7 @@
> >  #include <linux/rodata_test.h>
> >  #include <linux/jump_label.h>
> >  #include <linux/mem_encrypt.h>
> > +#include <linux/kcsan.h>
> >
> >  #include <asm/io.h>
> >  #include <asm/bugs.h>
> > @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
> >         acpi_subsystem_init();
> >         arch_post_acpi_subsys_init();
> >         sfi_init_late();
> > +       kcsan_init();
> >
> >         /* Do the rest non-__init'ed, we're now alive */
> >         arch_call_rest_init();
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index daad787fb795..74ab46e2ebd1 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
> >  obj-$(CONFIG_IRQ_WORK) += irq_work.o
> >  obj-$(CONFIG_CPU_PM) += cpu_pm.o
> >  obj-$(CONFIG_BPF) += bpf/
> > +obj-$(CONFIG_KCSAN) += kcsan/
> >
> >  obj-$(CONFIG_PERF_EVENTS) += events/
> >
> > diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> > new file mode 100644
> > index 000000000000..c25f07062d26
> > --- /dev/null
> > +++ b/kernel/kcsan/Makefile
> > @@ -0,0 +1,14 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +KCSAN_SANITIZE := n
> > +KCOV_INSTRUMENT := n
> > +
> > +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> > +
> > +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> > +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> > +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> > +
> > +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> > +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> > diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> > new file mode 100644
> > index 000000000000..dd44f7d9e491
> > --- /dev/null
> > +++ b/kernel/kcsan/atomic.c
> > @@ -0,0 +1,21 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/jiffies.h>
> > +
> > +#include "kcsan.h"
> > +
> > +/*
> > + * List all volatile globals that have been observed in races, to suppress
> > + * data-race reports between accesses to these variables.
> > + *
> > + * For now, we assume that volatile accesses of globals are as strong as atomic
> > + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> > + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> > + * than cast to volatile. Eventually, we hope to be able to remove this
> > + * function.
> > + */
> > +bool kcsan_is_atomic(const volatile void *ptr)
> > +{
> > +       /* only jiffies for now */
> > +       return ptr == &jiffies;
> > +}
> > diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> > new file mode 100644
> > index 000000000000..bc8d60b129eb
> > --- /dev/null
> > +++ b/kernel/kcsan/core.c
> > @@ -0,0 +1,428 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/atomic.h>
> > +#include <linux/bug.h>
> > +#include <linux/delay.h>
> > +#include <linux/export.h>
> > +#include <linux/init.h>
> > +#include <linux/percpu.h>
> > +#include <linux/preempt.h>
> > +#include <linux/random.h>
> > +#include <linux/sched.h>
> > +#include <linux/uaccess.h>
> > +
> > +#include "kcsan.h"
> > +#include "encoding.h"
> > +
> > +/*
> > + * Helper macros to iterate slots, starting from address slot itself, followed
> > + * by the right and left slots.
> > + */
> > +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> > +#define SLOT_IDX(slot, i)                                                      \
> > +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> > +                 KCSAN_CHECK_ADJACENT)) %                                     \
> > +        KCSAN_NUM_WATCHPOINTS)
> > +
> > +bool kcsan_enabled;
> > +
> > +/* Per-CPU kcsan_ctx for interrupts */
> > +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> > +       .disable = 0,
> > +       .atomic_next = 0,
> > +       .atomic_region = 0,
> > +       .atomic_region_flat = 0,
> > +};
> > +
> > +/*
> > + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> > + * able to safely update and access a watchpoint without introducing locking
> > + * overhead, we encode each watchpoint as a single atomic long. The initial
> > + * zero-initialized state matches INVALID_WATCHPOINT.
> > + */
> > +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> > +
> > +/*
> > + * Instructions skipped counter; see should_watch().
> > + */
> > +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> > +
> > +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> > +                                            bool expect_write,
> > +                                            long *encoded_watchpoint)
> > +{
> > +       const int slot = watchpoint_slot(addr);
> > +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> > +       atomic_long_t *watchpoint;
> > +       unsigned long wp_addr_masked;
> > +       size_t wp_size;
> > +       bool is_write;
> > +       int i;
> > +
> > +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> > +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> > +               *encoded_watchpoint = atomic_long_read(watchpoint);
> > +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> > +                                      &wp_size, &is_write))
> > +                       continue;
> > +
> > +               if (expect_write && !is_write)
> > +                       continue;
> > +
> > +               /* Check if the watchpoint matches the access. */
> > +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> > +                       return watchpoint;
> > +       }
> > +
> > +       return NULL;
> > +}
> > +
> > +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> > +                                              bool is_write)
> > +{
> > +       const int slot = watchpoint_slot(addr);
> > +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> > +       atomic_long_t *watchpoint;
> > +       int i;
> > +
> > +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> > +               long expect_val = INVALID_WATCHPOINT;
> > +
> > +               /* Try to acquire this slot. */
> > +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> > +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> > +                                                   encoded_watchpoint))
> > +                       return watchpoint;
> > +       }
> > +
> > +       return NULL;
> > +}
> > +
> > +/*
> > + * Return true if watchpoint was successfully consumed, false otherwise.
> > + *
> > + * This may return false if:
> > + *
> > + *     1. another thread already consumed the watchpoint;
> > + *     2. the thread that set up the watchpoint already removed it;
> > + *     3. the watchpoint was removed and then re-used.
> > + */
> > +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> > +                                         long encoded_watchpoint)
> > +{
> > +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> > +                                              CONSUMED_WATCHPOINT);
> > +}
> > +
> > +/*
> > + * Return true if watchpoint was not touched, false if consumed.
> > + */
> > +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> > +{
> > +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> > +              CONSUMED_WATCHPOINT;
> > +}
> > +
> > +static inline struct kcsan_ctx *get_ctx(void)
> > +{
> > +       /*
> > +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> > +        * also result in calls that generate warnings in uaccess regions.
> > +        */
> > +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> > +}
> > +
> > +
> > +static inline bool is_atomic(const volatile void *ptr)
> > +{
> > +       struct kcsan_ctx *ctx = get_ctx();
> > +
> > +       if (unlikely(ctx->atomic_next > 0)) {
> > +               --ctx->atomic_next;
> > +               return true;
> > +       }
> > +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> > +               return true;
> Won't ctx->atomic_region suffice for both flat and non-flat regions?
> (Do we really need the flat ones?)

The comment in include/linux/kcsan.h explains:
/*
* We use separate variables to store if we are in a nestable or flat
* atomic region. This helps make sure that an atomic region with
* nesting support is not suddenly aborted when a flat region is
* contained within. Effectively this allows supporting nesting flat
* atomic regions within an outer nestable atomic region. Support for
* this is required as there are cases where a seqlock reader critical
* section (flat atomic region) is contained within a seqlock writer
* critical section (nestable atomic region), and the "mismatching
* kcsan_end_atomic()" warning would trigger otherwise.
*/


> > +       return kcsan_is_atomic(ptr);
> > +}
> > +
> > +static inline bool should_watch(const volatile void *ptr)
> > +{
> > +       /*
> > +        * Never set up watchpoints when memory operations are atomic.
> > +        *
> > +        * We need to check this first, because: 1) atomics should not count
> > +        * towards skipped instructions below, and 2) to actually decrement
> > +        * kcsan_atomic_next for each atomic.
> > +        */
> > +       if (is_atomic(ptr))
> > +               return false;
> > +
> > +       /*
> > +        * We use a per-CPU counter, to avoid excessive contention; there is
> > +        * still enough non-determinism for the precise instructions that end up
> > +        * being watched to be mostly unpredictable. Using a PRNG like
> > +        * prandom_u32() turned out to be too slow.
> > +        */
> > +       return (this_cpu_inc_return(kcsan_skip) %
> > +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> > +}
> > +
> > +static inline bool is_enabled(void)
> > +{
> > +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> > +}
> > +
> > +static inline unsigned int get_delay(void)
> > +{
> > +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> > +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> > +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> > +                      ((prandom_u32() % max_delay) + 1) :
> > +                      max_delay;
> > +}
> > +
> > +/* === Public interface ===================================================== */
> > +
> > +void __init kcsan_init(void)
> > +{
> > +       BUG_ON(!in_task());
> > +
> > +       kcsan_debugfs_init();
> > +       kcsan_enable_current();
> > +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> > +       /*
> > +        * We are in the init task, and no other tasks should be running.
> > +        */
> > +       WRITE_ONCE(kcsan_enabled, true);
> > +#endif
> > +}
> > +
> > +/* === Exported interface =================================================== */
> > +
> > +void kcsan_disable_current(void)
> > +{
> > +       ++get_ctx()->disable;
> > +}
> > +EXPORT_SYMBOL(kcsan_disable_current);
> > +
> > +void kcsan_enable_current(void)
> > +{
> > +       if (get_ctx()->disable-- == 0) {
> > +               kcsan_disable_current(); /* restore to 0 */
> > +               kcsan_disable_current();
> > +               WARN(1, "mismatching %s", __func__);
> > +               kcsan_enable_current();
> > +       }
> > +}
> > +EXPORT_SYMBOL(kcsan_enable_current);
> > +
> > +void kcsan_begin_atomic(bool nest)
> > +{
> > +       if (nest)
> > +               ++get_ctx()->atomic_region;
> > +       else
> > +               get_ctx()->atomic_region_flat = true;
> > +}
> > +EXPORT_SYMBOL(kcsan_begin_atomic);
> > +
> > +void kcsan_end_atomic(bool nest)
> > +{
> > +       if (nest) {
> > +               if (get_ctx()->atomic_region-- == 0) {
> > +                       kcsan_begin_atomic(true); /* restore to 0 */
> > +                       kcsan_disable_current();
> > +                       WARN(1, "mismatching %s", __func__);
> > +                       kcsan_enable_current();
> > +               }
> > +       } else {
> > +               get_ctx()->atomic_region_flat = false;
> > +       }
> > +}
> > +EXPORT_SYMBOL(kcsan_end_atomic);
> > +
> > +void kcsan_atomic_next(int n)
> > +{
> > +       get_ctx()->atomic_next = n;
> > +}
> > +EXPORT_SYMBOL(kcsan_atomic_next);
> > +
> > +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write)
> > +{
> > +       atomic_long_t *watchpoint;
> > +       long encoded_watchpoint;
> > +       unsigned long flags;
> > +       enum kcsan_report_type report_type;
> > +
> > +       if (unlikely(!is_enabled()))
> > +               return false;
> > +
> > +       /*
> > +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> > +        * without user_access_save, as the address that ptr points to is only
> > +        * used to check if a watchpoint exists; ptr is never dereferenced.
> > +        */
> > +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> > +                                    &encoded_watchpoint);
> > +       if (watchpoint == NULL)
> > +               return true;
> > +
> > +       flags = user_access_save();
> > +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> > +               /*
> > +                * The other thread may not print any diagnostics, as it has
> > +                * already removed the watchpoint, or another thread consumed
> > +                * the watchpoint before this thread.
> > +                */
> > +               kcsan_counter_inc(kcsan_counter_report_races);
> > +               report_type = kcsan_report_race_check_race;
> > +       } else {
> > +               report_type = kcsan_report_race_check;
> > +       }
> > +
> > +       /* Encountered a data-race. */
> > +       kcsan_counter_inc(kcsan_counter_data_races);
> > +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> > +
> > +       user_access_restore(flags);
> > +       return false;
> > +}
> > +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> > +
> > +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write)
> > +{
> > +       atomic_long_t *watchpoint;
> > +       union {
> > +               u8 _1;
> > +               u16 _2;
> > +               u32 _4;
> > +               u64 _8;
> > +       } expect_value;
> > +       bool is_expected = true;
> > +       unsigned long ua_flags = user_access_save();
> > +       unsigned long irq_flags;
> > +
> > +       if (!should_watch(ptr))
> > +               goto out;
> > +
> > +       if (!check_encodable((unsigned long)ptr, size)) {
> > +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> > +               goto out;
> > +       }
> > +
> > +       /*
> > +        * Disable interrupts & preemptions to avoid another thread on the same
> > +        * CPU accessing memory locations for the set up watchpoint; this is to
> > +        * avoid reporting races to e.g. CPU-local data.
> > +        *
> > +        * An alternative would be adding the source CPU to the watchpoint
> > +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> > +        * several problems with this:
> > +        *   1. we should avoid stealing more bits from the watchpoint encoding
> > +        *      as it would affect accuracy, as well as increase performance
> > +        *      overhead in the fast-path;
> > +        *   2. if we are preempted, but there *is* a genuine data-race, we
> > +        *      would *not* report it -- since this is the common case (vs.
> > +        *      CPU-local data accesses), it makes more sense (from a data-race
> > +        *      detection PoV) to simply disable preemptions to ensure as many
> > +        *      tasks as possible run on other CPUs.
> > +        */
> > +       local_irq_save(irq_flags);
> > +
> > +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> > +       if (watchpoint == NULL) {
> > +               /*
> > +                * Out of capacity: the size of `watchpoints`, and the frequency
> > +                * with which `should_watch()` returns true should be tweaked so
> > +                * that this case happens very rarely.
> > +                */
> > +               kcsan_counter_inc(kcsan_counter_no_capacity);
> > +               goto out_unlock;
> > +       }
> > +
> > +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> > +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> > +
> > +       /*
> > +        * Read the current value, to later check and infer a race if the data
> > +        * was modified via a non-instrumented access, e.g. from a device.
> > +        */
> > +       switch (size) {
> > +       case 1:
> > +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> > +               break;
> > +       case 2:
> > +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> > +               break;
> > +       case 4:
> > +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> > +               break;
> > +       case 8:
> > +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> > +               break;
> > +       default:
> > +               break; /* ignore; we do not diff the values */
> > +       }
> > +
> > +#ifdef CONFIG_KCSAN_DEBUG
> > +       kcsan_disable_current();
> > +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> > +              is_write ? "write" : "read", size, ptr,
> > +              watchpoint_slot((unsigned long)ptr),
> > +              encode_watchpoint((unsigned long)ptr, size, is_write));
> > +       kcsan_enable_current();
> > +#endif
> > +
> > +       /*
> > +        * Delay this thread, to increase probability of observing a racy
> > +        * conflicting access.
> > +        */
> > +       udelay(get_delay());
> > +
> > +       /*
> > +        * Re-read value, and check if it is as expected; if not, we infer a
> > +        * racy access.
> > +        */
> > +       switch (size) {
> > +       case 1:
> > +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> > +               break;
> > +       case 2:
> > +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> > +               break;
> > +       case 4:
> > +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> > +               break;
> > +       case 8:
> > +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> > +               break;
> > +       default:
> > +               break; /* ignore; we do not diff the values */
> > +       }
> > +
> > +       /* Check if this access raced with another. */
> > +       if (!remove_watchpoint(watchpoint)) {
> > +               /*
> > +                * No need to increment 'race' counter, as the racing thread
> > +                * already did.
> > +                */
> > +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                            kcsan_report_race_setup);
> > +       } else if (!is_expected) {
> > +               /* Inferring a race, since the value should not have changed. */
> > +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> > +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> > +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                            kcsan_report_race_unknown_origin);
> > +#endif
> > +       }
> > +
> > +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> > +out_unlock:
> > +       local_irq_restore(irq_flags);
> > +out:
> > +       user_access_restore(ua_flags);
> > +}
> > +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> > diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> > new file mode 100644
> > index 000000000000..6ddcbd185f3a
> > --- /dev/null
> > +++ b/kernel/kcsan/debugfs.c
> > @@ -0,0 +1,225 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/atomic.h>
> > +#include <linux/bsearch.h>
> > +#include <linux/bug.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/init.h>
> > +#include <linux/kallsyms.h>
> > +#include <linux/mm.h>
> > +#include <linux/seq_file.h>
> > +#include <linux/sort.h>
> > +#include <linux/string.h>
> > +#include <linux/uaccess.h>
> > +
> > +#include "kcsan.h"
> > +
> > +/*
> > + * Statistics counters.
> > + */
> > +static atomic_long_t counters[kcsan_counter_count];
> > +
> > +/*
> > + * Addresses for filtering functions from reporting. This list can be used as a
> > + * whitelist or blacklist.
> > + */
> > +static struct {
> > +       unsigned long *addrs; /* array of addresses */
> > +       size_t size; /* current size */
> > +       int used; /* number of elements used */
> > +       bool sorted; /* if elements are sorted */
> > +       bool whitelist; /* if list is a blacklist or whitelist */
> > +} report_filterlist = {
> > +       .addrs = NULL,
> > +       .size = 8, /* small initial size */
> > +       .used = 0,
> > +       .sorted = false,
> > +       .whitelist = false, /* default is blacklist */
> > +};
> > +static DEFINE_SPINLOCK(report_filterlist_lock);
> > +
> > +static const char *counter_to_name(enum kcsan_counter_id id)
> > +{
> > +       switch (id) {
> > +       case kcsan_counter_used_watchpoints:
> > +               return "used_watchpoints";
> > +       case kcsan_counter_setup_watchpoints:
> > +               return "setup_watchpoints";
> > +       case kcsan_counter_data_races:
> > +               return "data_races";
> > +       case kcsan_counter_no_capacity:
> > +               return "no_capacity";
> > +       case kcsan_counter_report_races:
> > +               return "report_races";
> > +       case kcsan_counter_races_unknown_origin:
> > +               return "races_unknown_origin";
> > +       case kcsan_counter_unencodable_accesses:
> > +               return "unencodable_accesses";
> > +       case kcsan_counter_encoding_false_positives:
> > +               return "encoding_false_positives";
> > +       case kcsan_counter_count:
> > +               BUG();
> > +       }
> > +       return NULL;
> > +}
> > +
> > +void kcsan_counter_inc(enum kcsan_counter_id id)
> > +{
> > +       atomic_long_inc(&counters[id]);
> > +}
> > +
> > +void kcsan_counter_dec(enum kcsan_counter_id id)
> > +{
> > +       atomic_long_dec(&counters[id]);
> > +}
> > +
> > +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> > +{
> > +       const unsigned long a = *(const unsigned long *)rhs;
> > +       const unsigned long b = *(const unsigned long *)lhs;
> > +
> > +       return a < b ? -1 : a == b ? 0 : 1;
> > +}
> > +
> > +bool kcsan_skip_report(unsigned long func_addr)
> > +{
> > +       unsigned long symbolsize, offset;
> > +       unsigned long flags;
> > +       bool ret = false;
> > +
> > +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> > +               return false;
> > +       func_addr -= offset; /* get function start */
> > +
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +       if (report_filterlist.used == 0)
> > +               goto out;
> > +
> > +       /* Sort array if it is unsorted, and then do a binary search. */
> > +       if (!report_filterlist.sorted) {
> > +               sort(report_filterlist.addrs, report_filterlist.used,
> > +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> > +               report_filterlist.sorted = true;
> > +       }
> > +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> > +                       report_filterlist.used, sizeof(unsigned long),
> > +                       cmp_filterlist_addrs);
> > +       if (report_filterlist.whitelist)
> > +               ret = !ret;
> > +
> > +out:
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +       return ret;
> > +}
> > +
> > +static void set_report_filterlist_whitelist(bool whitelist)
> > +{
> > +       unsigned long flags;
> > +
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +       report_filterlist.whitelist = whitelist;
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +}
> > +
> > +static void insert_report_filterlist(const char *func)
> > +{
> > +       unsigned long flags;
> > +       unsigned long addr = kallsyms_lookup_name(func);
> > +
> > +       if (!addr) {
> > +               pr_err("KCSAN: could not find function: '%s'\n", func);
> > +               return;
> > +       }
> > +
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +
> > +       if (report_filterlist.addrs == NULL)
> > +               report_filterlist.addrs = /* initial allocation */
> > +                       kvmalloc_array(report_filterlist.size,
> > +                                      sizeof(unsigned long), GFP_KERNEL);
> You need to use braces in both branches here:
> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#placing-braces-and-spaces

Done @ v3.

> > +       else if (report_filterlist.used == report_filterlist.size) {
> > +               /* resize filterlist */
> > +               unsigned long *new_addrs;
> > +
> > +               report_filterlist.size *= 2;
> > +               new_addrs = kvmalloc_array(report_filterlist.size,
> > +                                          sizeof(unsigned long), GFP_KERNEL);
> > +               memcpy(new_addrs, report_filterlist.addrs,
> > +                      report_filterlist.used * sizeof(unsigned long));
> > +               kvfree(report_filterlist.addrs);
> > +               report_filterlist.addrs = new_addrs;
> > +       }
> > +
> > +       /* Note: deduplicating should be done in userspace. */
> > +       report_filterlist.addrs[report_filterlist.used++] =
> > +               kallsyms_lookup_name(func);
> > +       report_filterlist.sorted = false;
> > +
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +}
> > +
> > +static int show_info(struct seq_file *file, void *v)
> > +{
> > +       int i;
> > +       unsigned long flags;
> > +
> > +       /* show stats */
> > +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> > +       for (i = 0; i < kcsan_counter_count; ++i)
> > +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> > +                          atomic_long_read(&counters[i]));
> > +
> > +       /* show filter functions, and filter type */
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +       seq_printf(file, "\n%s functions: %s\n",
> > +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> > +                  report_filterlist.used == 0 ? "none" : "");
> > +       for (i = 0; i < report_filterlist.used; ++i)
> > +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +
> > +       return 0;
> > +}
> > +
> > +static int debugfs_open(struct inode *inode, struct file *file)
> > +{
> > +       return single_open(file, show_info, NULL);
> > +}
> > +
> > +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> > +                            size_t count, loff_t *off)
> > +{
> > +       char kbuf[KSYM_NAME_LEN];
> > +       char *arg;
> > +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> > +
> > +       if (copy_from_user(kbuf, buf, read_len))
> > +               return -EINVAL;
> > +       kbuf[read_len] = '\0';
> > +       arg = strstrip(kbuf);
> > +
> > +       if (!strncmp(arg, "on", sizeof("on") - 1))
> > +               WRITE_ONCE(kcsan_enabled, true);
> > +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> > +               WRITE_ONCE(kcsan_enabled, false);
> > +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> > +               set_report_filterlist_whitelist(true);
> > +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> > +               set_report_filterlist_whitelist(false);
> > +       else if (arg[0] == '!')
> > +               insert_report_filterlist(&arg[1]);
> > +       else
> > +               return -EINVAL;
> > +
> > +       return count;
> > +}
> > +
> > +static const struct file_operations debugfs_ops = { .read = seq_read,
> > +                                                   .open = debugfs_open,
> > +                                                   .write = debugfs_write,
> > +                                                   .release = single_release };
> > +
> > +void __init kcsan_debugfs_init(void)
> > +{
> > +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> > +}
> > diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> > new file mode 100644
> > index 000000000000..8f9b1ce0e59f
> > --- /dev/null
> > +++ b/kernel/kcsan/encoding.h
> > @@ -0,0 +1,94 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _MM_KCSAN_ENCODING_H
> > +#define _MM_KCSAN_ENCODING_H
> > +
> > +#include <linux/bits.h>
> > +#include <linux/log2.h>
> > +#include <linux/mm.h>
> > +
> > +#include "kcsan.h"
> > +
> > +#define SLOT_RANGE PAGE_SIZE
> > +#define INVALID_WATCHPOINT 0
> > +#define CONSUMED_WATCHPOINT 1
> > +
> > +/*
> > + * The maximum useful size of accesses for which we set up watchpoints is the
> > + * max range of slots we check on an access.
> > + */
> > +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> > +
> > +/*
> > + * Number of bits we use to store size info.
> > + */
> > +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> > +/*
> > + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> > + * however, most 64-bit architectures do not use the full 64-bit address space.
> > + * Also, in order for a false positive to be observable 2 things need to happen:
> > + *
> > + *     1. different addresses but with the same encoded address race;
> > + *     2. and both map onto the same watchpoint slots;
> > + *
> > + * Both these are assumed to be very unlikely. However, in case it still happens
> > + * happens, the report logic will filter out the false positive (see report.c).
> > + */
> > +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> > +
> > +/*
> > + * Masks to set/retrieve the encoded data.
> > + */
> > +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> > +#define WATCHPOINT_SIZE_MASK                                                   \
> > +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> > +#define WATCHPOINT_ADDR_MASK                                                   \
> > +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> > +
> > +static inline bool check_encodable(unsigned long addr, size_t size)
> > +{
> > +       return size <= MAX_ENCODABLE_SIZE;
> > +}
> > +
> > +static inline long encode_watchpoint(unsigned long addr, size_t size,
> > +                                    bool is_write)
> > +{
> > +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> > +                     (size << WATCHPOINT_ADDR_BITS) |
> > +                     (addr & WATCHPOINT_ADDR_MASK));
> > +}
> > +
> > +static inline bool decode_watchpoint(long watchpoint,
> > +                                    unsigned long *addr_masked, size_t *size,
> > +                                    bool *is_write)
> > +{
> > +       if (watchpoint == INVALID_WATCHPOINT ||
> > +           watchpoint == CONSUMED_WATCHPOINT)
> > +               return false;
> > +
> > +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> > +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> > +               WATCHPOINT_ADDR_BITS;
> > +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> > +
> > +       return true;
> > +}
> > +
> > +/*
> > + * Return watchpoint slot for an address.
> > + */
> > +static inline int watchpoint_slot(unsigned long addr)
> > +{
> > +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> > +}
> > +
> > +static inline bool matching_access(unsigned long addr1, size_t size1,
> > +                                  unsigned long addr2, size_t size2)
> > +{
> > +       unsigned long end_range1 = addr1 + size1 - 1;
> > +       unsigned long end_range2 = addr2 + size2 - 1;
> > +
> > +       return addr1 <= end_range2 && addr2 <= end_range1;
> > +}
> > +
> > +#endif /* _MM_KCSAN_ENCODING_H */
> > diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> > new file mode 100644
> > index 000000000000..45cf2fffd8a0
> > --- /dev/null
> > +++ b/kernel/kcsan/kcsan.c
> > @@ -0,0 +1,86 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> > + * see Documentation/dev-tools/kcsan.rst.
> > + */
> > +
> > +#include <linux/export.h>
> > +
> > +#include "kcsan.h"
> > +
> > +/*
> > + * KCSAN uses the same instrumentation that is emitted by supported compilers
> > + * for Thread Sanitizer (TSAN).
> > + *
> > + * When enabled, the compiler emits instrumentation calls (the functions
> > + * prefixed with "__tsan" below) for all loads and stores that it generated;
> > + * inline asm is not instrumented.
> > + */
> > +
> > +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> > +       void __tsan_read##size(void *ptr)                                      \
> > +       {                                                                      \
> > +               __kcsan_check_read(ptr, size);                                 \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> > +       void __tsan_write##size(void *ptr)                                     \
> > +       {                                                                      \
> > +               __kcsan_check_write(ptr, size);                                \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_write##size)
> > +
> > +DEFINE_TSAN_READ_WRITE(1);
> > +DEFINE_TSAN_READ_WRITE(2);
> > +DEFINE_TSAN_READ_WRITE(4);
> > +DEFINE_TSAN_READ_WRITE(8);
> > +DEFINE_TSAN_READ_WRITE(16);
> > +
> > +/*
> > + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> > + * but e.g. recent versions of Clang do.
> > + */
> > +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> > +       void __tsan_unaligned_read##size(void *ptr)                            \
> > +       {                                                                      \
> > +               __kcsan_check_read(ptr, size);                                 \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> > +       void __tsan_unaligned_write##size(void *ptr)                           \
> > +       {                                                                      \
> > +               __kcsan_check_write(ptr, size);                                \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> > +
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> > +
> > +void __tsan_read_range(void *ptr, size_t size)
> > +{
> > +       __kcsan_check_read(ptr, size);
> > +}
> > +EXPORT_SYMBOL(__tsan_read_range);
> > +
> > +void __tsan_write_range(void *ptr, size_t size)
> > +{
> > +       __kcsan_check_write(ptr, size);
> > +}
> > +EXPORT_SYMBOL(__tsan_write_range);
> > +
> > +/*
> > + * The below are not required KCSAN, but can still be emitted by the compiler.
> > + */
> > +void __tsan_func_entry(void *call_pc)
> > +{
> > +}
> > +EXPORT_SYMBOL(__tsan_func_entry);
> > +void __tsan_func_exit(void)
> > +{
> > +}
> > +EXPORT_SYMBOL(__tsan_func_exit);
> > +void __tsan_init(void)
> > +{
> > +}
> > +EXPORT_SYMBOL(__tsan_init);
> > diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> > new file mode 100644
> > index 000000000000..429479b3041d
> > --- /dev/null
> > +++ b/kernel/kcsan/kcsan.h
> > @@ -0,0 +1,140 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _MM_KCSAN_KCSAN_H
> > +#define _MM_KCSAN_KCSAN_H
> > +
> > +#include <linux/kcsan.h>
> > +
> > +/*
> > + * Total number of watchpoints. An address range maps into a specific slot as
> > + * specified in `encoding.h`. Although larger number of watchpoints may not even
> > + * be usable due to limited thread count, a larger value will improve
> > + * performance due to reducing cache-line contention.
> > + */
> > +#define KCSAN_NUM_WATCHPOINTS 64
> > +
> > +/*
> > + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> > + *
> > + *     1. the address slot is already occupied, check if any adjacent slots are
> > + *        free;
> > + *     2. accesses that straddle a slot boundary due to size that exceeds a
> > + *        slot's range may check adjacent slots if any watchpoint matches.
> > + *
> > + * Note that accesses with very large size may still miss a watchpoint; however,
> > + * given this should be rare, this is a reasonable trade-off to make, since this
> > + * will avoid:
> > + *
> > + *     1. excessive contention between watchpoint checks and setup;
> > + *     2. larger number of simultaneous watchpoints without sacrificing
> > + *        performance.
> > + */
> > +#define KCSAN_CHECK_ADJACENT 1
> > +
> > +/*
> > + * Globally enable and disable KCSAN.
> > + */
> > +extern bool kcsan_enabled;
> > +
> > +/*
> > + * Helper that returns true if access to ptr should be considered as an atomic
> > + * access, even though it is not explicitly atomic.
> > + */
> > +bool kcsan_is_atomic(const volatile void *ptr);
> > +
> > +/*
> > + * Initialize debugfs file.
> > + */
> > +void kcsan_debugfs_init(void);
> > +
> > +enum kcsan_counter_id {
> Labels in enums should be capitalized:
> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#macros-enums-and-rtl

Done @ v3.

> > +       /*
> > +        * Number of watchpoints currently in use.
> > +        */
> > +       kcsan_counter_used_watchpoints,
> > +
> > +       /*
> > +        * Total number of watchpoints set up.
> > +        */
> > +       kcsan_counter_setup_watchpoints,
> > +
> > +       /*
> > +        * Total number of data-races.
> > +        */
> > +       kcsan_counter_data_races,
> > +
> > +       /*
> > +        * Number of times no watchpoints were available.
> > +        */
> > +       kcsan_counter_no_capacity,
> > +
> > +       /*
> > +        * A thread checking a watchpoint raced with another checking thread;
> > +        * only one will be reported.
> > +        */
> > +       kcsan_counter_report_races,
> > +
> > +       /*
> > +        * Observed data value change, but writer thread unknown.
> > +        */
> > +       kcsan_counter_races_unknown_origin,
> > +
> > +       /*
> > +        * The access cannot be encoded to a valid watchpoint.
> > +        */
> > +       kcsan_counter_unencodable_accesses,
> > +
> > +       /*
> > +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> > +        * accesses.
> > +        */
> > +       kcsan_counter_encoding_false_positives,
> > +
> > +       kcsan_counter_count, /* number of counters */
> > +};
> > +
> > +/*
> > + * Increment/decrement counter with given id; avoid calling these in fast-path.
> > + */
> > +void kcsan_counter_inc(enum kcsan_counter_id id);
> > +void kcsan_counter_dec(enum kcsan_counter_id id);
> > +
> > +/*
> > + * Returns true if data-races in the function symbol that maps to addr (offsets
> > + * are ignored) should *not* be reported.
> > + */
> > +bool kcsan_skip_report(unsigned long func_addr);
> > +
> > +enum kcsan_report_type {
> > +       /*
> > +        * The thread that set up the watchpoint and briefly stalled was
> > +        * signalled that another thread triggered the watchpoint, and thus a
> > +        * race was encountered.
> > +        */
> > +       kcsan_report_race_setup,
> > +
> > +       /*
> > +        * A thread encountered a watchpoint for the access, therefore a race
> > +        * was encountered.
> > +        */
> > +       kcsan_report_race_check,
> > +
> > +       /*
> > +        * A thread encountered a watchpoint for the access, but the other
> > +        * racing thread can no longer be signaled that a race occurred.
> > +        */
> > +       kcsan_report_race_check_race,
> > +
> > +       /*
> > +        * No other thread was observed to race with the access, but the data
> > +        * value before and after the stall differs.
> > +        */
> > +       kcsan_report_race_unknown_origin,
> > +};
> > +/*
> > + * Print a race report from thread that encountered the race.
> > + */
> > +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> > +                 int cpu_id, enum kcsan_report_type type);
> > +
> > +#endif /* _MM_KCSAN_KCSAN_H */
> > diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> > new file mode 100644
> > index 000000000000..517db539e4e7
> > --- /dev/null
> > +++ b/kernel/kcsan/report.c
> > @@ -0,0 +1,306 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/preempt.h>
> > +#include <linux/printk.h>
> > +#include <linux/sched.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/stacktrace.h>
> > +
> > +#include "kcsan.h"
> > +#include "encoding.h"
> > +
> > +/*
> > + * Max. number of stack entries to show in the report.
> > + */
> > +#define NUM_STACK_ENTRIES 16
> > +
> > +/*
> > + * Other thread info: communicated from other racing thread to thread that set
> > + * up the watchpoint, which then prints the complete report atomically. Only
> > + * need one struct, as all threads should to be serialized regardless to print
> > + * the reports, with reporting being in the slow-path.
> > + */
> > +static struct {
> > +       const volatile void *ptr;
> > +       size_t size;
> > +       bool is_write;
> > +       int task_pid;
> > +       int cpu_id;
> > +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> > +       int num_stack_entries;
> > +} other_info = { .ptr = NULL };
> > +
> > +static DEFINE_SPINLOCK(other_info_lock);
> > +static DEFINE_SPINLOCK(report_lock);
> > +
> > +static bool set_or_lock_other_info(unsigned long *flags,
> > +                                  const volatile void *ptr, size_t size,
> > +                                  bool is_write, int cpu_id,
> > +                                  enum kcsan_report_type type)
> > +{
> > +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> > +               return true;
> > +
> > +       for (;;) {
> > +               spin_lock_irqsave(&other_info_lock, *flags);
> > +
> > +               switch (type) {
> > +               case kcsan_report_race_check:
> > +                       if (other_info.ptr != NULL) {
> > +                               /* still in use, retry */
> > +                               break;
> > +                       }
> > +                       other_info.ptr = ptr;
> > +                       other_info.size = size;
> > +                       other_info.is_write = is_write;
> > +                       other_info.task_pid =
> > +                               in_task() ? task_pid_nr(current) : -1;
> > +                       other_info.cpu_id = cpu_id;
> > +                       other_info.num_stack_entries = stack_trace_save(
> > +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> > +                       /*
> > +                        * other_info may now be consumed by thread we raced
> > +                        * with.
> > +                        */
> > +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> > +                       return false;
> > +
> > +               case kcsan_report_race_setup:
> > +                       if (other_info.ptr == NULL)
> > +                               break; /* no data available yet, retry */
> > +
> > +                       /*
> > +                        * First check if matching based on how watchpoint was
> > +                        * encoded.
> > +                        */
> > +                       if (!matching_access((unsigned long)other_info.ptr &
> > +                                                    WATCHPOINT_ADDR_MASK,
> > +                                            other_info.size,
> > +                                            (unsigned long)ptr &
> > +                                                    WATCHPOINT_ADDR_MASK,
> > +                                            size))
> > +                               break; /* mismatching access, retry */
> > +
> > +                       if (!matching_access((unsigned long)other_info.ptr,
> > +                                            other_info.size,
> > +                                            (unsigned long)ptr, size)) {
> > +                               /*
> > +                                * If the actual accesses to not match, this was
> > +                                * a false positive due to watchpoint encoding.
> > +                                */
> > +                               other_info.ptr = NULL; /* mark for reuse */
> > +                               kcsan_counter_inc(
> > +                                       kcsan_counter_encoding_false_positives);
> > +                               spin_unlock_irqrestore(&other_info_lock,
> > +                                                      *flags);
> > +                               return false;
> > +                       }
> > +
> > +                       /*
> > +                        * Matching access: keep other_info locked, as this
> > +                        * thread uses it to print the full report; unlocked in
> > +                        * end_report.
> > +                        */
> > +                       return true;
> > +
> > +               default:
> > +                       BUG();
> > +               }
> > +
> > +               spin_unlock_irqrestore(&other_info_lock, *flags);
> > +       }
> > +}
> > +
> > +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> > +{
> > +       switch (type) {
> > +       case kcsan_report_race_setup:
> > +               /* irqsaved already via other_info_lock */
> > +               spin_lock(&report_lock);
> > +               break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               spin_lock_irqsave(&report_lock, *flags);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +}
> > +
> > +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> > +{
> > +       switch (type) {
> > +       case kcsan_report_race_setup:
> > +               other_info.ptr = NULL; /* mark for reuse */
> > +               spin_unlock(&report_lock);
> > +               spin_unlock_irqrestore(&other_info_lock, *flags);
> > +               break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               spin_unlock_irqrestore(&report_lock, *flags);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +}
> > +
> > +static const char *get_access_type(bool is_write)
> > +{
> > +       return is_write ? "write" : "read";
> > +}
> > +
> > +/* Return thread description: in task or interrupt. */
> > +static const char *get_thread_desc(int task_id)
> > +{
> > +       if (task_id != -1) {
> > +               static char buf[32]; /* safe: protected by report_lock */
> > +
> > +               snprintf(buf, sizeof(buf), "task %i", task_id);
> > +               return buf;
> > +       }
> > +       return in_nmi() ? "NMI" : "interrupt";
> > +}
> > +
> > +/* Helper to skip KCSAN-related functions in stack-trace. */
> > +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> > +{
> > +       char buf[64];
> > +       int skip = 0;
> > +
> > +       for (; skip < num_entries; ++skip) {
> > +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> > +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> > +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> > +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> > +                       break;
> > +               }
> > +       }
> > +       return skip;
> > +}
> FWIW another option is to put all KCSAN-related functions in a
> separate code section and check if the function addresses are in the
> address range belonging to that section.
> This will work even with non-symbolized stacks.

Thanks for the suggestion. Is it worth it, i.e. will it simplify the
design and code? If it simplifies the design (or makes the fast-path
significantly faster), then yes, but otherwise I prefer the simplest
possible solution here. AFAIK, it will not make it simpler nor faster.
Using non-symbolized stacks should not be the common use-case (how to
usefully debug any data-race?).

> > +/* Compares symbolized strings of addr1 and addr2. */
> > +static int sym_strcmp(void *addr1, void *addr2)
> > +{
> > +       char buf1[64];
> > +       char buf2[64];
> > +
> > +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> > +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> > +       return strncmp(buf1, buf2, sizeof(buf1));
> > +}
> > +
> > +/*
> > + * Returns true if a report was generated, false otherwise.
> > + */
> > +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> > +                         int cpu_id, enum kcsan_report_type type)
> > +{
> > +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> > +       int num_stack_entries =
> > +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> > +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> > +       int other_skipnr;
> > +
> > +       /* Check if the top stackframe is in a blacklisted function. */
> > +       if (kcsan_skip_report(stack_entries[skipnr]))
> > +               return false;
> > +       if (type == kcsan_report_race_setup) {
> > +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> > +                                               other_info.num_stack_entries);
> > +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> > +                       return false;
> > +       }
> > +
> > +       /* Print report header. */
> > +       pr_err("==================================================================\n");
> > +       switch (type) {
> > +       case kcsan_report_race_setup: {
> > +               void *this_fn = (void *)stack_entries[skipnr];
> > +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> > +               int cmp;
> > +
> > +               /*
> > +                * Order functions lexographically for consistent bug titles.
> > +                * Do not print offset of functions to keep title short.
> > +                */
> > +               cmp = sym_strcmp(other_fn, this_fn);
> > +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> > +                      cmp < 0 ? other_fn : this_fn,
> > +                      cmp < 0 ? this_fn : other_fn);
> > +       } break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               pr_err("BUG: KCSAN: data-race in %pS\n",
> > +                      (void *)stack_entries[skipnr]);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +
> > +       pr_err("\n");
> > +
> > +       /* Print information about the racing accesses. */
> > +       switch (type) {
> > +       case kcsan_report_race_setup:
> > +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> > +                      get_access_type(other_info.is_write), other_info.ptr,
> > +                      other_info.size, get_thread_desc(other_info.task_pid),
> > +                      other_info.cpu_id);
> > +
> > +               /* Print the other thread's stack trace. */
> > +               stack_trace_print(other_info.stack_entries + other_skipnr,
> > +                                 other_info.num_stack_entries - other_skipnr,
> > +                                 0);
> > +
> > +               pr_err("\n");
> > +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> > +                      get_access_type(is_write), ptr, size,
> > +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> > +                      cpu_id);
> > +               break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> > +                      get_access_type(is_write), ptr, size,
> > +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> > +                      cpu_id);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +       /* Print stack trace of this thread. */
> > +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> > +                         0);
> > +
> > +       /* Print report footer. */
> > +       pr_err("\n");
> > +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> > +       dump_stack_print_info(KERN_DEFAULT);
> > +       pr_err("==================================================================\n");
> > +
> > +       return true;
> > +}
> > +
> > +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> > +                 int cpu_id, enum kcsan_report_type type)
> > +{
> > +       unsigned long flags = 0;
> > +
> > +       if (type == kcsan_report_race_check_race)
> > +               return;
> > +
> > +       kcsan_disable_current();
> > +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> > +               start_report(&flags, type);
> > +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> > +                   panic_on_warn)
> > +                       panic("panic_on_warn set ...\n");
> > +
> > +               end_report(&flags, type);
> > +       }
> > +       kcsan_enable_current();
> > +}
> > diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> > new file mode 100644
> > index 000000000000..68c896a24529
> > --- /dev/null
> > +++ b/kernel/kcsan/test.c
> > @@ -0,0 +1,117 @@
> > +// SPDX-License-Identifier: GPL-2.0
> IIRC checkpatch.pl requires all SPDX headers to look like this one
> (C++-style, not C-style).
> Please double check and fix the headers in other files if necessary.

Checkpatch is happy. // for .c, and /**/ for .h.

> This file might also use some comments, now it's not easy to
> understand what it's testing.

Done @ v3.

> > +
> > +#include <linux/init.h>
> > +#include <linux/kernel.h>
> > +#include <linux/printk.h>
> > +#include <linux/random.h>
> > +#include <linux/types.h>
> > +
> > +#include "encoding.h"
> > +
> > +#define ITERS_PER_TEST 2000
> > +
> > +/* Test requirements. */
> > +static bool test_requires(void)
> > +{
> > +       /* random should be initialized */
> > +       return prandom_u32() + prandom_u32() != 0;
> > +}
> > +
> > +/* Test watchpoint encode and decode. */
> > +static bool test_encode_decode(void)
> > +{
> > +       int i;
> > +
> > +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> > +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> > +               bool is_write = prandom_u32() % 2;
> > +               unsigned long addr;
> > +
> > +               prandom_bytes(&addr, sizeof(addr));
> > +               if (WARN_ON(!check_encodable(addr, size)))
> > +                       return false;
> > +
> > +               /* encode and decode */
> > +               {
> > +                       const long encoded_watchpoint =
> > +                               encode_watchpoint(addr, size, is_write);
> > +                       unsigned long verif_masked_addr;
> > +                       size_t verif_size;
> > +                       bool verif_is_write;
> > +
> > +                       /* check special watchpoints */
> > +                       if (WARN_ON(decode_watchpoint(
> > +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> > +                                   &verif_size, &verif_is_write)))
> > +                               return false;
> > +                       if (WARN_ON(decode_watchpoint(
> > +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> > +                                   &verif_size, &verif_is_write)))
> > +                               return false;
> > +
> > +                       /* check decoding watchpoint returns same data */
> > +                       if (WARN_ON(!decode_watchpoint(
> > +                                   encoded_watchpoint, &verif_masked_addr,
> > +                                   &verif_size, &verif_is_write)))
> > +                               return false;
> > +                       if (WARN_ON(verif_masked_addr !=
> > +                                   (addr & WATCHPOINT_ADDR_MASK)))
> > +                               goto fail;
> > +                       if (WARN_ON(verif_size != size))
> > +                               goto fail;
> > +                       if (WARN_ON(is_write != verif_is_write))
> > +                               goto fail;
> > +
> > +                       continue;
> > +fail:
> > +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> > +                              __func__, is_write ? "write" : "read", size,
> > +                              addr, encoded_watchpoint,
> > +                              verif_is_write ? "write" : "read", verif_size,
> > +                              verif_masked_addr);
> > +                       return false;
> > +               }
> > +       }
> > +
> > +       return true;
> > +}
> > +
> > +static bool test_matching_access(void)
> > +{
> > +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> > +               return false;
> > +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> > +               return false;
> > +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> > +               return false;
> > +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> > +               return false;
> > +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> > +               return false;
> > +       return true;
> > +}
> > +
> > +static int __init kcsan_selftest(void)
> > +{
> > +       int passed = 0;
> > +       int total = 0;
> > +
> > +#define RUN_TEST(do_test)                                                      \
> > +       do {                                                                   \
> > +               ++total;                                                       \
> > +               if (do_test())                                                 \
> > +                       ++passed;                                              \
> > +               else                                                           \
> > +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> > +       } while (0)
> > +
> > +       RUN_TEST(test_requires);
> > +       RUN_TEST(test_encode_decode);
> > +       RUN_TEST(test_matching_access);
> > +
> > +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> > +       if (passed != total)
> > +               panic("KCSAN selftests failed");
> > +       return 0;
> > +}
> > +postcore_initcall(kcsan_selftest);
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 93d97f9b0157..35accd1d93de 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
> >
> >  source "lib/Kconfig.ubsan"
> >
> > +source "lib/Kconfig.kcsan"
> > +
> >  config ARCH_HAS_DEVMEM_IS_ALLOWED
> >         bool
> >
> > diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> > new file mode 100644
> > index 000000000000..3e1f1acfb24b
> > --- /dev/null
> > +++ b/lib/Kconfig.kcsan
> > @@ -0,0 +1,88 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +config HAVE_ARCH_KCSAN
> > +       bool
> > +
> > +menuconfig KCSAN
> > +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> > +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> > +       default n
> > +       help
> > +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> > +         uses a watchpoint-based sampling approach to detect races.
> > +
> > +if KCSAN
> > +
> > +config KCSAN_SELFTEST
> > +       bool "KCSAN: perform short selftests on boot"
> > +       default y
> > +       help
> > +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> > +
> > +config KCSAN_EARLY_ENABLE
> > +       bool "KCSAN: early enable"
> > +       default y
> > +       help
> > +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> > +         later be enabled/disabled via debugfs.
> > +
> > +config KCSAN_UDELAY_MAX_TASK
> > +       int "KCSAN: maximum delay in microseconds (for tasks)"
> > +       default 80
> > +       help
> > +         For tasks, the max. microsecond delay after setting up a watchpoint.
> > +
> > +config KCSAN_UDELAY_MAX_INTERRUPT
> > +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> > +       default 20
> > +       help
> > +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> > +
> > +config KCSAN_DELAY_RANDOMIZE
> > +       bool "KCSAN: randomize delays"
> > +       default y
> > +       help
> > +         If delays should be randomized; if false, the chosen delay is simply
> > +         the maximum values defined above.
> > +
> > +config KCSAN_WATCH_SKIP_INST
> > +       int "KCSAN: watchpoint instruction skip"
> > +       default 2000
> > +       help
> > +         The number of per-CPU memory operations to skip watching, before
> > +         another watchpoint is set up; in other words, 1 in
> > +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> > +         watchpoint. A smaller value results in more aggressive race
> > +         detection, whereas a larger value improves system performance at the
> > +         cost of missing some races.
> > +
> > +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> > +       bool "KCSAN: report races of unknown origin"
> > +       default y
> > +       help
> > +         If KCSAN should report races where only one access is known, and the
> > +         conflicting access is of unknown origin. This type of race is
> > +         reported if it was only possible to infer a race due to a data-value
> > +         change while an access is being delayed on a watchpoint.
> > +
> > +config KCSAN_IGNORE_ATOMICS
> > +       bool "KCSAN: do not instrument marked atomic accesses"
> > +       default n
> > +       help
> > +         If enabled, never instruments marked atomic accesses. This results in
> > +         not reporting data-races where one access is atomic and the other is
> > +         a plain access.
> > +
> Isn't it better to decide at runtime, whether we want to ignore atomics or not?

See below.

> > +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> > +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> > +       default n
> > +       help
> > +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> > +         This option should only be used to prune initial data-races found in
> > +         existing code.
> Overall, I think it's better to make most of these configs boot-time flags.
> This way one won't need to rebuild the kernel every time they want to
> turn some option on or off.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-21 15:54       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-21 15:54 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern, Andrea Parri,
	Andrey Konovalov, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Boqun Feng, Borislav Petkov, Daniel Axtens, Daniel Lustig,
	dave.hansen, David Howells, Dmitriy Vyukov, H. Peter Anvin,
	Ingo Molnar, Jade Alglave, Joel Fernandes, Jonathan Corbet,
	Josh Poimboeuf, Luc Maranget, Mark Rutland, Nicholas Piggin,
	Paul McKenney, Peter Zijlstra, Thomas Gleixner, Will Deacon,
	kasan-dev, linux-arch, open list:DOCUMENTATION, linux-efi,
	Linux Kbuild mailing list, LKML, Linux Memory Management List,
	the arch/x86 maintainers

On Mon, 21 Oct 2019 at 15:37, Alexander Potapenko <glider@google.com> wrote:
>
> On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
> >
> > Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> > kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> > See the included Documentation/dev-tools/kcsan.rst for more details.
> >
> > This patch adds basic infrastructure, but does not yet enable KCSAN for
> > any architecture.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v2:
> > * Elaborate comment about instrumentation calls emitted by compilers.
> > * Replace kcsan_check_access(.., {true, false}) with
> >   kcsan_check_{read,write} for improved readability.
> > * Change bug title of race of unknown origin to just say "data-race in".
> > * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> > * Add comment about safety of find_watchpoint without user_access_save.
> > * Remove unnecessary preempt_disable/enable and elaborate on comment why
> >   we want to disable interrupts and preemptions.
> > * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
> >   contexts [Suggested by Mark Rutland].
> > ---
> >  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
> >  MAINTAINERS                       |  11 +
> >  Makefile                          |   3 +-
> >  include/linux/compiler-clang.h    |   9 +
> >  include/linux/compiler-gcc.h      |   7 +
> >  include/linux/compiler.h          |  35 ++-
> >  include/linux/kcsan-checks.h      | 147 ++++++++++
> >  include/linux/kcsan.h             | 108 ++++++++
> >  include/linux/sched.h             |   4 +
> >  init/init_task.c                  |   8 +
> >  init/main.c                       |   2 +
> >  kernel/Makefile                   |   1 +
> >  kernel/kcsan/Makefile             |  14 +
> >  kernel/kcsan/atomic.c             |  21 ++
> >  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
> >  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
> >  kernel/kcsan/encoding.h           |  94 +++++++
> >  kernel/kcsan/kcsan.c              |  86 ++++++
> >  kernel/kcsan/kcsan.h              | 140 ++++++++++
> >  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
> >  kernel/kcsan/test.c               | 117 ++++++++
> >  lib/Kconfig.debug                 |   2 +
> >  lib/Kconfig.kcsan                 |  88 ++++++
> >  lib/Makefile                      |   3 +
> >  scripts/Makefile.kcsan            |   6 +
> >  scripts/Makefile.lib              |  10 +
> >  26 files changed, 2069 insertions(+), 9 deletions(-)
> >  create mode 100644 Documentation/dev-tools/kcsan.rst
> >  create mode 100644 include/linux/kcsan-checks.h
> >  create mode 100644 include/linux/kcsan.h
> >  create mode 100644 kernel/kcsan/Makefile
> >  create mode 100644 kernel/kcsan/atomic.c
> >  create mode 100644 kernel/kcsan/core.c
> >  create mode 100644 kernel/kcsan/debugfs.c
> >  create mode 100644 kernel/kcsan/encoding.h
> >  create mode 100644 kernel/kcsan/kcsan.c
> >  create mode 100644 kernel/kcsan/kcsan.h
> >  create mode 100644 kernel/kcsan/report.c
> >  create mode 100644 kernel/kcsan/test.c
> >  create mode 100644 lib/Kconfig.kcsan
> >  create mode 100644 scripts/Makefile.kcsan
> >
> > diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> > new file mode 100644
> > index 000000000000..497b09e5cc96
> > --- /dev/null
> > +++ b/Documentation/dev-tools/kcsan.rst
> > @@ -0,0 +1,203 @@
> > +The Kernel Concurrency Sanitizer (KCSAN)
> > +========================================
> > +
> > +Overview
> > +--------
> > +
> > +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> > +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> > +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> > +detector. Key priorities in KCSAN's design are lack of false positives,
> > +scalability, and simplicity. More details can be found in `Implementation
> > +Details`_.
> > +
> > +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> > +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> > +With Clang it requires version 7.0.0 or later.
> > +
> > +Usage
> > +-----
> > +
> > +To enable KCSAN configure kernel with::
> > +
> > +    CONFIG_KCSAN = y
> > +
> > +KCSAN provides several other configuration options to customize behaviour (see
> > +their respective help text for more info).
> > +
> > +debugfs
> > +~~~~~~~
> > +
> > +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> > +
> > +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> > +  ``/sys/kernel/debug/kcsan``.
> > +
> > +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> > +  ``some_func_name`` to the report filter list, which (by default) blacklists
> > +  reporting data-races where either one of the top stackframes are a function
> > +  in the list.
> > +
> > +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> > +  changes the report filtering behaviour. For example, the blacklist feature
> > +  can be used to silence frequently occurring data-races; the whitelist feature
> > +  can help with reproduction and testing of fixes.
> > +
> > +Error reports
> > +~~~~~~~~~~~~~
> > +
> > +A typical data-race report looks like this::
> > +
> > +    ==================================================================
> > +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> > +
> > +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> > +     kernfs_refresh_inode+0x70/0x170
> > +     kernfs_iop_permission+0x4f/0x90
> > +     inode_permission+0x190/0x200
> > +     link_path_walk.part.0+0x503/0x8e0
> > +     path_lookupat.isra.0+0x69/0x4d0
> > +     filename_lookup+0x136/0x280
> > +     user_path_at_empty+0x47/0x60
> > +     vfs_statx+0x9b/0x130
> > +     __do_sys_newlstat+0x50/0xb0
> > +     __x64_sys_newlstat+0x37/0x50
> > +     do_syscall_64+0x85/0x260
> > +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > +
> > +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> > +     generic_permission+0x5b/0x2a0
> > +     kernfs_iop_permission+0x66/0x90
> > +     inode_permission+0x190/0x200
> > +     link_path_walk.part.0+0x503/0x8e0
> > +     path_lookupat.isra.0+0x69/0x4d0
> > +     filename_lookup+0x136/0x280
> > +     user_path_at_empty+0x47/0x60
> > +     do_faccessat+0x11a/0x390
> > +     __x64_sys_access+0x3c/0x50
> > +     do_syscall_64+0x85/0x260
> > +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > +
> > +    Reported by Kernel Concurrency Sanitizer on:
> > +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +The header of the report provides a short summary of the functions involved in
> > +the race. It is followed by the access types and stack traces of the 2 threads
> > +involved in the data-race.
> > +
> > +The other less common type of data-race report looks like this::
> > +
> > +    ==================================================================
> > +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> > +
> > +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> > +     e1000_clean_rx_irq+0x551/0xb10
> > +     e1000_clean+0x533/0xda0
> > +     net_rx_action+0x329/0x900
> > +     __do_softirq+0xdb/0x2db
> > +     irq_exit+0x9b/0xa0
> > +     do_IRQ+0x9c/0xf0
> > +     ret_from_intr+0x0/0x18
> > +     default_idle+0x3f/0x220
> > +     arch_cpu_idle+0x21/0x30
> > +     do_idle+0x1df/0x230
> > +     cpu_startup_entry+0x14/0x20
> > +     rest_init+0xc5/0xcb
> > +     arch_call_rest_init+0x13/0x2b
> > +     start_kernel+0x6db/0x700
> > +
> > +    Reported by Kernel Concurrency Sanitizer on:
> > +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> > +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> > +    ==================================================================
> > +
> > +This report is generated where it was not possible to determine the other
> > +racing thread, but a race was inferred due to the data-value of the watched
> > +memory location having changed. These can occur either due to missing
> > +instrumentation or e.g. DMA accesses.
> > +
> > +Data-Races
> > +----------
> Nit: I was under the impression "data races" were commonly written
> without a hyphen. I may be mistaken.

Thanks. I've updated it everywhere except in bug titles, which should
remain as-is.

> > +
> > +Informally, two operations *conflict* if they access the same memory location,
> > +and at least one of them is a write operation. In an execution, two memory
> > +operations from different threads form a **data-race** if they *conflict*, at
> > +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> > +the "happens-before" order according to the `LKMM
> > +<../../tools/memory-model/Documentation/explanation.txt>`_.
> > +
> > +Relationship with the Linux Kernel Memory Model (LKMM)
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +The LKMM defines the propagation and ordering rules of various memory
> > +operations, which gives developers the ability to reason about concurrent code.
> > +Ultimately this allows to determine the possible executions of concurrent code,
> > +and if that code is free from data-races.
> > +
> > +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> > +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> > +words, KCSAN assumes that as long as a plain access is not observed to race
> > +with another conflicting access, memory operations are correctly ordered.
> > +
> > +This means that KCSAN will not report *potential* data-races due to missing
> > +memory ordering. If, however, missing memory ordering (that is observable with
> > +a particular compiler and architecture) leads to an observable data-race (e.g.
> > +entering a critical section erroneously), KCSAN would report the resulting
> > +data-race.
> > +
> > +Implementation Details
> > +----------------------
> > +
> > +The general approach is inspired by `DataCollider
> > +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> > +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> > +relies on compiler instrumentation. Watchpoints are implemented using an
> > +efficient encoding that stores access type, size, and address in a long; the
> > +benefits of using "soft watchpoints" are portability and greater flexibility in
> > +limiting which accesses trigger a watchpoint.
> > +
> > +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> > +memory operations; for each instrumented plain access:
> > +
> > +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> > +   write, then we encountered a racing access.
> > +
> > +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> > +   stall some delay.
> > +
> > +3. Also check the data value before the delay, and re-check the data value
> > +   after delay; if the values mismatch, we infer a race of unknown origin.
> > +
> > +To detect data-races between plain and atomic memory operations, KCSAN also
> > +annotates atomic accesses, but only to check if a watchpoint exists
> > +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> > +accesses.
> > +
> > +Key Properties
> > +~~~~~~~~~~~~~~
> > +
> > +1. **Memory Overhead:** No shadow memory is required. The current
> > +   implementation uses a small array of longs to encode watchpoint information,
> > +   which is negligible.
> > +
> > +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> > +   efficient watchpoint encoding that does not require acquiring any shared
> > +   locks in the fast-path. For kernel boot with a default config on a system
> > +   where nproc=8 we measure a slow-down of 10-15x.
> > +
> > +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> > +   may result in missed data-races (false negatives), compared to a
> > +   happens-before data-race detector.
> > +
> > +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> > +
> > +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> > +   runtime. With a happens-before data-race detector, any omission leads to
> > +   false positives, which is especially important in the context of the kernel
> > +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> > +   result, maintenance overheads are minimal as the kernel evolves.
> > +
> > +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> > +   setting up watchpoints, racy writes from devices can also be detected.
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 0154674cbad3..71f7fb625490 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
> >  F:     scripts/kconfig/
> >  F:     scripts/Kconfig.include
> >
> > +KCSAN
> > +M:     Marco Elver <elver@google.com>
> > +R:     Dmitry Vyukov <dvyukov@google.com>
> > +L:     kasan-dev@googlegroups.com
> > +S:     Maintained
> > +F:     Documentation/dev-tools/kcsan.rst
> > +F:     include/linux/kcsan*.h
> > +F:     kernel/kcsan/
> > +F:     lib/Kconfig.kcsan
> > +F:     scripts/Makefile.kcsan
> > +
> >  KDUMP
> >  M:     Dave Young <dyoung@redhat.com>
> >  M:     Baoquan He <bhe@redhat.com>
> > diff --git a/Makefile b/Makefile
> > index ffd7a912fc46..ad4729176252 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
> >
> >  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
> >  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> > -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> > +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
> >  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
> >  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
> >  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> > @@ -900,6 +900,7 @@ endif
> >  include scripts/Makefile.kasan
> >  include scripts/Makefile.extrawarn
> >  include scripts/Makefile.ubsan
> > +include scripts/Makefile.kcsan
> >
> >  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
> >  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> > diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> > index 333a6695a918..a213eb55e725 100644
> > --- a/include/linux/compiler-clang.h
> > +++ b/include/linux/compiler-clang.h
> > @@ -24,6 +24,15 @@
> >  #define __no_sanitize_address
> >  #endif
> >
> > +#if __has_feature(thread_sanitizer)
> > +/* emulate gcc's __SANITIZE_THREAD__ flag */
> > +#define __SANITIZE_THREAD__
> > +#define __no_sanitize_thread \
> > +               __attribute__((no_sanitize("thread")))
> > +#else
> > +#define __no_sanitize_thread
> > +#endif
> > +
> >  /*
> >   * Not all versions of clang implement the the type-generic versions
> >   * of the builtin overflow checkers. Fortunately, clang implements
> > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> > index d7ee4c6bad48..de105ca29282 100644
> > --- a/include/linux/compiler-gcc.h
> > +++ b/include/linux/compiler-gcc.h
> > @@ -145,6 +145,13 @@
> >  #define __no_sanitize_address
> >  #endif
> >
> > +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> > +#define __no_sanitize_thread                                                   \
> > +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> > +#else
> > +#define __no_sanitize_thread
> > +#endif
> > +
> >  #if GCC_VERSION >= 50100
> >  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
> >  #endif
> > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > index 5e88e7e33abe..350d80dbee4d 100644
> > --- a/include/linux/compiler.h
> > +++ b/include/linux/compiler.h
> > @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> >  #endif
> >
> >  #include <uapi/linux/types.h>
> > +#include <linux/kcsan-checks.h>
> >
> >  #define __READ_ONCE_SIZE                                               \
> >  ({                                                                     \
> > @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> >         }                                                               \
> >  })
> >
> > -static __always_inline
> > -void __read_once_size(const volatile void *p, void *res, int size)
> > -{
> > -       __READ_ONCE_SIZE;
> > -}
> > -
> >  #ifdef CONFIG_KASAN
> >  /*
> >   * We can't declare function 'inline' because __no_sanitize_address confilcts
> > @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
> >  # define __no_kasan_or_inline __always_inline
> >  #endif
> >
> > -static __no_kasan_or_inline
> > +#ifdef CONFIG_KCSAN
> > +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> > +#else
> > +# define __no_kcsan_or_inline __always_inline
> > +#endif
> > +
> > +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> > +/* Avoid any instrumentation or inline. */
> > +#define __no_sanitize_or_inline                                                \
> > +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> > +#else
> > +#define __no_sanitize_or_inline __always_inline
> > +#endif
> > +
> > +static __no_kcsan_or_inline
> > +void __read_once_size(const volatile void *p, void *res, int size)
> > +{
> > +       kcsan_check_atomic_read((const void *)p, size);
> > +       __READ_ONCE_SIZE;
> > +}
> > +
> > +static __no_sanitize_or_inline
> >  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
> >  {
> >         __READ_ONCE_SIZE;
> >  }
> >
> > -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> > +static __no_kcsan_or_inline
> > +void __write_once_size(volatile void *p, void *res, int size)
> >  {
> > +       kcsan_check_atomic_write((const void *)p, size);
> > +
> >         switch (size) {
> >         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
> >         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> > diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> > new file mode 100644
> > index 000000000000..4203603ae852
> > --- /dev/null
> > +++ b/include/linux/kcsan-checks.h
> > @@ -0,0 +1,147 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _LINUX_KCSAN_CHECKS_H
> > +#define _LINUX_KCSAN_CHECKS_H
> > +
> > +#include <linux/types.h>
> > +
> > +/*
> > + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> > + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> > + * to validate access to an address.   Never use these in header files!
> > + */
> > +#ifdef CONFIG_KCSAN
> > +/**
> > + * __kcsan_check_watchpoint - check if a watchpoint exists
> > + *
> > + * Returns true if no race was detected, and we may then proceed to set up a
> > + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> > + * encountered, and we may not set up a watchpoint after.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + * @is_write is access a write
> > + * @return true if no race was detected, false otherwise.
> > + */
> > +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write);
> I think the parameter indentations are a bit off here and below (I've
> also looked at the Github diff);
> have you considered running checkpatch.pl?

It was formatted with clang-format, it's correct with 8 space tabs.
checkpath.pl is happy.

> > +
> > +/**
> > + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> > + *
> > + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> > + * reports the data-race.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + * @is_write is access a write
> > + */
> > +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write);
> > +#else
> > +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> > +                                           size_t size, bool is_write)
> > +{
> > +       return true;
> > +}
> > +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> > +                                           size_t size, bool is_write)
> > +{
> > +}
> > +#endif
> > +
> > +/*
> > + * kcsan_*: Only available when the particular compilation unit has KCSAN
> > + * instrumentation enabled. May be used in header files.
> > + */
> > +#ifdef __SANITIZE_THREAD__
> > +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> > +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> > +#else
> > +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> > +                                         bool is_write)
> > +{
> > +       return true;
> > +}
> > +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> > +                                         bool is_write)
> > +{
> > +}
> > +#endif
> > +
> > +/**
> > + * __kcsan_check_read - check regular read access for data-races
> > + *
> > + * Full read access that checks watchpoint and sets up a watchpoint if this
> > + * access is sampled. Note that, setting up watchpoints for plain reads is
> > + * required to also detect data-races with atomic accesses.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define __kcsan_check_read(ptr, size)                                          \
> > +       do {                                                                   \
> > +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> > +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> > +       } while (0)
> > +
> > +/**
> > + * __kcsan_check_write - check regular write access for data-races
> > + *
> > + * Full write access that checks watchpoint and sets up a watchpoint if this
> > + * access is sampled.
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define __kcsan_check_write(ptr, size)                                         \
> > +       do {                                                                   \
> > +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> > +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> > +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> > +       } while (0)
> > +
> > +/**
> > + * kcsan_check_read - check regular read access for data-races
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define kcsan_check_read(ptr, size)                                            \
> > +       do {                                                                   \
> > +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> > +                       kcsan_setup_watchpoint(ptr, size, false);              \
> > +       } while (0)
> > +
> > +/**
> > + * kcsan_check_write - check regular write access for data-races
> > + *
> > + * @ptr address of access
> > + * @size size of access
> > + */
> > +#define kcsan_check_write(ptr, size)                                           \
> > +       do {                                                                   \
> > +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> > +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> > +                       kcsan_setup_watchpoint(ptr, size, true);               \
> > +       } while (0)
> > +
> > +/*
> > + * Check for atomic accesses: if atomic access are not ignored, this simply
> > + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> > + */
> > +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> > +#define kcsan_check_atomic_read(...)                                           \
> > +       do {                                                                   \
> > +       } while (0)
> > +#define kcsan_check_atomic_write(...)                                          \
> > +       do {                                                                   \
> > +       } while (0)
> > +#else
> > +#define kcsan_check_atomic_read(ptr, size)                                     \
> > +       kcsan_check_watchpoint(ptr, size, false)
> > +#define kcsan_check_atomic_write(ptr, size)                                    \
> > +       kcsan_check_watchpoint(ptr, size, true)
> > +#endif
> > +
> > +#endif /* _LINUX_KCSAN_CHECKS_H */
> > diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> > new file mode 100644
> > index 000000000000..fd5de2ba3a16
> > --- /dev/null
> > +++ b/include/linux/kcsan.h
> > @@ -0,0 +1,108 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _LINUX_KCSAN_H
> > +#define _LINUX_KCSAN_H
> > +
> > +#include <linux/types.h>
> > +#include <linux/kcsan-checks.h>
> > +
> > +#ifdef CONFIG_KCSAN
> > +
> > +/*
> > + * Context for each thread of execution: for tasks, this is stored in
> > + * task_struct, and interrupts access internal per-CPU storage.
> > + */
> > +struct kcsan_ctx {
> > +       int disable; /* disable counter */
> > +       int atomic_next; /* number of following atomic ops */
> > +
> > +       /*
> > +        * We use separate variables to store if we are in a nestable or flat
> > +        * atomic region. This helps make sure that an atomic region with
> > +        * nesting support is not suddenly aborted when a flat region is
> > +        * contained within. Effectively this allows supporting nesting flat
> > +        * atomic regions within an outer nestable atomic region. Support for
> > +        * this is required as there are cases where a seqlock reader critical
> > +        * section (flat atomic region) is contained within a seqlock writer
> > +        * critical section (nestable atomic region), and the "mismatching
> > +        * kcsan_end_atomic()" warning would trigger otherwise.
> > +        */
> > +       int atomic_region;
> > +       bool atomic_region_flat;
> > +};
> > +
> > +/**
> > + * kcsan_init - initialize KCSAN runtime
> > + */
> > +void kcsan_init(void);
> > +
> > +/**
> > + * kcsan_disable_current - disable KCSAN for the current context
> > + *
> > + * Supports nesting.
> > + */
> > +void kcsan_disable_current(void);
> > +
> > +/**
> > + * kcsan_enable_current - re-enable KCSAN for the current context
> > + *
> > + * Supports nesting.
> > + */
> > +void kcsan_enable_current(void);
> > +
> > +/**
> > + * kcsan_begin_atomic - use to denote an atomic region
> > + *
> > + * Accesses within the atomic region may appear to race with other accesses but
> > + * should be considered atomic.
> > + *
> > + * @nest true if regions may be nested, or false for flat region
> > + */
> > +void kcsan_begin_atomic(bool nest);
> > +
> > +/**
> > + * kcsan_end_atomic - end atomic region
> > + *
> > + * @nest must match argument to kcsan_begin_atomic().
> > + */
> > +void kcsan_end_atomic(bool nest);
> > +
> > +/**
> > + * kcsan_atomic_next - consider following accesses as atomic
> > + *
> > + * Force treating the next n memory accesses for the current context as atomic
> > + * operations.
> > + *
> > + * @n number of following memory accesses to treat as atomic.
> > + */
> > +void kcsan_atomic_next(int n);
> > +
> > +#else /* CONFIG_KCSAN */
> > +
> > +static inline void kcsan_init(void)
> I think it should be ok to put {} on the same line with the function
> prototype here, see e.g. include/linux/kasan.h

Done @ v3.

> > +{
> > +}
> > +
> > +static inline void kcsan_disable_current(void)
> > +{
> > +}
> > +
> > +static inline void kcsan_enable_current(void)
> > +{
> > +}
> > +
> > +static inline void kcsan_begin_atomic(bool nest)
> > +{
> > +}
> > +
> > +static inline void kcsan_end_atomic(bool nest)
> > +{
> > +}
> > +
> > +static inline void kcsan_atomic_next(int n)
> > +{
> > +}
> > +
> > +#endif /* CONFIG_KCSAN */
> > +
> > +#endif /* _LINUX_KCSAN_H */
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 2c2e56bd8913..9490e417bf4a 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -31,6 +31,7 @@
> >  #include <linux/task_io_accounting.h>
> >  #include <linux/posix-timers.h>
> >  #include <linux/rseq.h>
> > +#include <linux/kcsan.h>
> >
> >  /* task_struct member predeclarations (sorted alphabetically): */
> >  struct audit_context;
> > @@ -1171,6 +1172,9 @@ struct task_struct {
> >  #ifdef CONFIG_KASAN
> >         unsigned int                    kasan_depth;
> >  #endif
> > +#ifdef CONFIG_KCSAN
> > +       struct kcsan_ctx                kcsan_ctx;
> > +#endif
> >
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >         /* Index of current stored address in ret_stack: */
> > diff --git a/init/init_task.c b/init/init_task.c
> > index 9e5cbe5eab7b..e229416c3314 100644
> > --- a/init/init_task.c
> > +++ b/init/init_task.c
> > @@ -161,6 +161,14 @@ struct task_struct init_task
> >  #ifdef CONFIG_KASAN
> >         .kasan_depth    = 1,
> >  #endif
> > +#ifdef CONFIG_KCSAN
> > +       .kcsan_ctx = {
> > +               .disable                = 1,
> > +               .atomic_next            = 0,
> > +               .atomic_region          = 0,
> > +               .atomic_region_flat     = 0,
> > +       },
> > +#endif
> >  #ifdef CONFIG_TRACE_IRQFLAGS
> >         .softirqs_enabled = 1,
> >  #endif
> > diff --git a/init/main.c b/init/main.c
> > index 91f6ebb30ef0..4d814de017ee 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -93,6 +93,7 @@
> >  #include <linux/rodata_test.h>
> >  #include <linux/jump_label.h>
> >  #include <linux/mem_encrypt.h>
> > +#include <linux/kcsan.h>
> >
> >  #include <asm/io.h>
> >  #include <asm/bugs.h>
> > @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
> >         acpi_subsystem_init();
> >         arch_post_acpi_subsys_init();
> >         sfi_init_late();
> > +       kcsan_init();
> >
> >         /* Do the rest non-__init'ed, we're now alive */
> >         arch_call_rest_init();
> > diff --git a/kernel/Makefile b/kernel/Makefile
> > index daad787fb795..74ab46e2ebd1 100644
> > --- a/kernel/Makefile
> > +++ b/kernel/Makefile
> > @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
> >  obj-$(CONFIG_IRQ_WORK) += irq_work.o
> >  obj-$(CONFIG_CPU_PM) += cpu_pm.o
> >  obj-$(CONFIG_BPF) += bpf/
> > +obj-$(CONFIG_KCSAN) += kcsan/
> >
> >  obj-$(CONFIG_PERF_EVENTS) += events/
> >
> > diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> > new file mode 100644
> > index 000000000000..c25f07062d26
> > --- /dev/null
> > +++ b/kernel/kcsan/Makefile
> > @@ -0,0 +1,14 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +KCSAN_SANITIZE := n
> > +KCOV_INSTRUMENT := n
> > +
> > +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> > +
> > +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> > +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> > +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> > +
> > +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> > +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> > diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> > new file mode 100644
> > index 000000000000..dd44f7d9e491
> > --- /dev/null
> > +++ b/kernel/kcsan/atomic.c
> > @@ -0,0 +1,21 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/jiffies.h>
> > +
> > +#include "kcsan.h"
> > +
> > +/*
> > + * List all volatile globals that have been observed in races, to suppress
> > + * data-race reports between accesses to these variables.
> > + *
> > + * For now, we assume that volatile accesses of globals are as strong as atomic
> > + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> > + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> > + * than cast to volatile. Eventually, we hope to be able to remove this
> > + * function.
> > + */
> > +bool kcsan_is_atomic(const volatile void *ptr)
> > +{
> > +       /* only jiffies for now */
> > +       return ptr == &jiffies;
> > +}
> > diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> > new file mode 100644
> > index 000000000000..bc8d60b129eb
> > --- /dev/null
> > +++ b/kernel/kcsan/core.c
> > @@ -0,0 +1,428 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/atomic.h>
> > +#include <linux/bug.h>
> > +#include <linux/delay.h>
> > +#include <linux/export.h>
> > +#include <linux/init.h>
> > +#include <linux/percpu.h>
> > +#include <linux/preempt.h>
> > +#include <linux/random.h>
> > +#include <linux/sched.h>
> > +#include <linux/uaccess.h>
> > +
> > +#include "kcsan.h"
> > +#include "encoding.h"
> > +
> > +/*
> > + * Helper macros to iterate slots, starting from address slot itself, followed
> > + * by the right and left slots.
> > + */
> > +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> > +#define SLOT_IDX(slot, i)                                                      \
> > +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> > +                 KCSAN_CHECK_ADJACENT)) %                                     \
> > +        KCSAN_NUM_WATCHPOINTS)
> > +
> > +bool kcsan_enabled;
> > +
> > +/* Per-CPU kcsan_ctx for interrupts */
> > +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> > +       .disable = 0,
> > +       .atomic_next = 0,
> > +       .atomic_region = 0,
> > +       .atomic_region_flat = 0,
> > +};
> > +
> > +/*
> > + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> > + * able to safely update and access a watchpoint without introducing locking
> > + * overhead, we encode each watchpoint as a single atomic long. The initial
> > + * zero-initialized state matches INVALID_WATCHPOINT.
> > + */
> > +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> > +
> > +/*
> > + * Instructions skipped counter; see should_watch().
> > + */
> > +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> > +
> > +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> > +                                            bool expect_write,
> > +                                            long *encoded_watchpoint)
> > +{
> > +       const int slot = watchpoint_slot(addr);
> > +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> > +       atomic_long_t *watchpoint;
> > +       unsigned long wp_addr_masked;
> > +       size_t wp_size;
> > +       bool is_write;
> > +       int i;
> > +
> > +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> > +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> > +               *encoded_watchpoint = atomic_long_read(watchpoint);
> > +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> > +                                      &wp_size, &is_write))
> > +                       continue;
> > +
> > +               if (expect_write && !is_write)
> > +                       continue;
> > +
> > +               /* Check if the watchpoint matches the access. */
> > +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> > +                       return watchpoint;
> > +       }
> > +
> > +       return NULL;
> > +}
> > +
> > +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> > +                                              bool is_write)
> > +{
> > +       const int slot = watchpoint_slot(addr);
> > +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> > +       atomic_long_t *watchpoint;
> > +       int i;
> > +
> > +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> > +               long expect_val = INVALID_WATCHPOINT;
> > +
> > +               /* Try to acquire this slot. */
> > +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> > +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> > +                                                   encoded_watchpoint))
> > +                       return watchpoint;
> > +       }
> > +
> > +       return NULL;
> > +}
> > +
> > +/*
> > + * Return true if watchpoint was successfully consumed, false otherwise.
> > + *
> > + * This may return false if:
> > + *
> > + *     1. another thread already consumed the watchpoint;
> > + *     2. the thread that set up the watchpoint already removed it;
> > + *     3. the watchpoint was removed and then re-used.
> > + */
> > +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> > +                                         long encoded_watchpoint)
> > +{
> > +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> > +                                              CONSUMED_WATCHPOINT);
> > +}
> > +
> > +/*
> > + * Return true if watchpoint was not touched, false if consumed.
> > + */
> > +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> > +{
> > +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> > +              CONSUMED_WATCHPOINT;
> > +}
> > +
> > +static inline struct kcsan_ctx *get_ctx(void)
> > +{
> > +       /*
> > +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> > +        * also result in calls that generate warnings in uaccess regions.
> > +        */
> > +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> > +}
> > +
> > +
> > +static inline bool is_atomic(const volatile void *ptr)
> > +{
> > +       struct kcsan_ctx *ctx = get_ctx();
> > +
> > +       if (unlikely(ctx->atomic_next > 0)) {
> > +               --ctx->atomic_next;
> > +               return true;
> > +       }
> > +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> > +               return true;
> Won't ctx->atomic_region suffice for both flat and non-flat regions?
> (Do we really need the flat ones?)

The comment in include/linux/kcsan.h explains:
/*
* We use separate variables to store if we are in a nestable or flat
* atomic region. This helps make sure that an atomic region with
* nesting support is not suddenly aborted when a flat region is
* contained within. Effectively this allows supporting nesting flat
* atomic regions within an outer nestable atomic region. Support for
* this is required as there are cases where a seqlock reader critical
* section (flat atomic region) is contained within a seqlock writer
* critical section (nestable atomic region), and the "mismatching
* kcsan_end_atomic()" warning would trigger otherwise.
*/


> > +       return kcsan_is_atomic(ptr);
> > +}
> > +
> > +static inline bool should_watch(const volatile void *ptr)
> > +{
> > +       /*
> > +        * Never set up watchpoints when memory operations are atomic.
> > +        *
> > +        * We need to check this first, because: 1) atomics should not count
> > +        * towards skipped instructions below, and 2) to actually decrement
> > +        * kcsan_atomic_next for each atomic.
> > +        */
> > +       if (is_atomic(ptr))
> > +               return false;
> > +
> > +       /*
> > +        * We use a per-CPU counter, to avoid excessive contention; there is
> > +        * still enough non-determinism for the precise instructions that end up
> > +        * being watched to be mostly unpredictable. Using a PRNG like
> > +        * prandom_u32() turned out to be too slow.
> > +        */
> > +       return (this_cpu_inc_return(kcsan_skip) %
> > +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> > +}
> > +
> > +static inline bool is_enabled(void)
> > +{
> > +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> > +}
> > +
> > +static inline unsigned int get_delay(void)
> > +{
> > +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> > +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> > +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> > +                      ((prandom_u32() % max_delay) + 1) :
> > +                      max_delay;
> > +}
> > +
> > +/* === Public interface ===================================================== */
> > +
> > +void __init kcsan_init(void)
> > +{
> > +       BUG_ON(!in_task());
> > +
> > +       kcsan_debugfs_init();
> > +       kcsan_enable_current();
> > +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> > +       /*
> > +        * We are in the init task, and no other tasks should be running.
> > +        */
> > +       WRITE_ONCE(kcsan_enabled, true);
> > +#endif
> > +}
> > +
> > +/* === Exported interface =================================================== */
> > +
> > +void kcsan_disable_current(void)
> > +{
> > +       ++get_ctx()->disable;
> > +}
> > +EXPORT_SYMBOL(kcsan_disable_current);
> > +
> > +void kcsan_enable_current(void)
> > +{
> > +       if (get_ctx()->disable-- == 0) {
> > +               kcsan_disable_current(); /* restore to 0 */
> > +               kcsan_disable_current();
> > +               WARN(1, "mismatching %s", __func__);
> > +               kcsan_enable_current();
> > +       }
> > +}
> > +EXPORT_SYMBOL(kcsan_enable_current);
> > +
> > +void kcsan_begin_atomic(bool nest)
> > +{
> > +       if (nest)
> > +               ++get_ctx()->atomic_region;
> > +       else
> > +               get_ctx()->atomic_region_flat = true;
> > +}
> > +EXPORT_SYMBOL(kcsan_begin_atomic);
> > +
> > +void kcsan_end_atomic(bool nest)
> > +{
> > +       if (nest) {
> > +               if (get_ctx()->atomic_region-- == 0) {
> > +                       kcsan_begin_atomic(true); /* restore to 0 */
> > +                       kcsan_disable_current();
> > +                       WARN(1, "mismatching %s", __func__);
> > +                       kcsan_enable_current();
> > +               }
> > +       } else {
> > +               get_ctx()->atomic_region_flat = false;
> > +       }
> > +}
> > +EXPORT_SYMBOL(kcsan_end_atomic);
> > +
> > +void kcsan_atomic_next(int n)
> > +{
> > +       get_ctx()->atomic_next = n;
> > +}
> > +EXPORT_SYMBOL(kcsan_atomic_next);
> > +
> > +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write)
> > +{
> > +       atomic_long_t *watchpoint;
> > +       long encoded_watchpoint;
> > +       unsigned long flags;
> > +       enum kcsan_report_type report_type;
> > +
> > +       if (unlikely(!is_enabled()))
> > +               return false;
> > +
> > +       /*
> > +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> > +        * without user_access_save, as the address that ptr points to is only
> > +        * used to check if a watchpoint exists; ptr is never dereferenced.
> > +        */
> > +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> > +                                    &encoded_watchpoint);
> > +       if (watchpoint == NULL)
> > +               return true;
> > +
> > +       flags = user_access_save();
> > +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> > +               /*
> > +                * The other thread may not print any diagnostics, as it has
> > +                * already removed the watchpoint, or another thread consumed
> > +                * the watchpoint before this thread.
> > +                */
> > +               kcsan_counter_inc(kcsan_counter_report_races);
> > +               report_type = kcsan_report_race_check_race;
> > +       } else {
> > +               report_type = kcsan_report_race_check;
> > +       }
> > +
> > +       /* Encountered a data-race. */
> > +       kcsan_counter_inc(kcsan_counter_data_races);
> > +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> > +
> > +       user_access_restore(flags);
> > +       return false;
> > +}
> > +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> > +
> > +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> > +                             bool is_write)
> > +{
> > +       atomic_long_t *watchpoint;
> > +       union {
> > +               u8 _1;
> > +               u16 _2;
> > +               u32 _4;
> > +               u64 _8;
> > +       } expect_value;
> > +       bool is_expected = true;
> > +       unsigned long ua_flags = user_access_save();
> > +       unsigned long irq_flags;
> > +
> > +       if (!should_watch(ptr))
> > +               goto out;
> > +
> > +       if (!check_encodable((unsigned long)ptr, size)) {
> > +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> > +               goto out;
> > +       }
> > +
> > +       /*
> > +        * Disable interrupts & preemptions to avoid another thread on the same
> > +        * CPU accessing memory locations for the set up watchpoint; this is to
> > +        * avoid reporting races to e.g. CPU-local data.
> > +        *
> > +        * An alternative would be adding the source CPU to the watchpoint
> > +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> > +        * several problems with this:
> > +        *   1. we should avoid stealing more bits from the watchpoint encoding
> > +        *      as it would affect accuracy, as well as increase performance
> > +        *      overhead in the fast-path;
> > +        *   2. if we are preempted, but there *is* a genuine data-race, we
> > +        *      would *not* report it -- since this is the common case (vs.
> > +        *      CPU-local data accesses), it makes more sense (from a data-race
> > +        *      detection PoV) to simply disable preemptions to ensure as many
> > +        *      tasks as possible run on other CPUs.
> > +        */
> > +       local_irq_save(irq_flags);
> > +
> > +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> > +       if (watchpoint == NULL) {
> > +               /*
> > +                * Out of capacity: the size of `watchpoints`, and the frequency
> > +                * with which `should_watch()` returns true should be tweaked so
> > +                * that this case happens very rarely.
> > +                */
> > +               kcsan_counter_inc(kcsan_counter_no_capacity);
> > +               goto out_unlock;
> > +       }
> > +
> > +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> > +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> > +
> > +       /*
> > +        * Read the current value, to later check and infer a race if the data
> > +        * was modified via a non-instrumented access, e.g. from a device.
> > +        */
> > +       switch (size) {
> > +       case 1:
> > +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> > +               break;
> > +       case 2:
> > +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> > +               break;
> > +       case 4:
> > +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> > +               break;
> > +       case 8:
> > +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> > +               break;
> > +       default:
> > +               break; /* ignore; we do not diff the values */
> > +       }
> > +
> > +#ifdef CONFIG_KCSAN_DEBUG
> > +       kcsan_disable_current();
> > +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> > +              is_write ? "write" : "read", size, ptr,
> > +              watchpoint_slot((unsigned long)ptr),
> > +              encode_watchpoint((unsigned long)ptr, size, is_write));
> > +       kcsan_enable_current();
> > +#endif
> > +
> > +       /*
> > +        * Delay this thread, to increase probability of observing a racy
> > +        * conflicting access.
> > +        */
> > +       udelay(get_delay());
> > +
> > +       /*
> > +        * Re-read value, and check if it is as expected; if not, we infer a
> > +        * racy access.
> > +        */
> > +       switch (size) {
> > +       case 1:
> > +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> > +               break;
> > +       case 2:
> > +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> > +               break;
> > +       case 4:
> > +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> > +               break;
> > +       case 8:
> > +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> > +               break;
> > +       default:
> > +               break; /* ignore; we do not diff the values */
> > +       }
> > +
> > +       /* Check if this access raced with another. */
> > +       if (!remove_watchpoint(watchpoint)) {
> > +               /*
> > +                * No need to increment 'race' counter, as the racing thread
> > +                * already did.
> > +                */
> > +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                            kcsan_report_race_setup);
> > +       } else if (!is_expected) {
> > +               /* Inferring a race, since the value should not have changed. */
> > +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> > +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> > +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                            kcsan_report_race_unknown_origin);
> > +#endif
> > +       }
> > +
> > +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> > +out_unlock:
> > +       local_irq_restore(irq_flags);
> > +out:
> > +       user_access_restore(ua_flags);
> > +}
> > +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> > diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> > new file mode 100644
> > index 000000000000..6ddcbd185f3a
> > --- /dev/null
> > +++ b/kernel/kcsan/debugfs.c
> > @@ -0,0 +1,225 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/atomic.h>
> > +#include <linux/bsearch.h>
> > +#include <linux/bug.h>
> > +#include <linux/debugfs.h>
> > +#include <linux/init.h>
> > +#include <linux/kallsyms.h>
> > +#include <linux/mm.h>
> > +#include <linux/seq_file.h>
> > +#include <linux/sort.h>
> > +#include <linux/string.h>
> > +#include <linux/uaccess.h>
> > +
> > +#include "kcsan.h"
> > +
> > +/*
> > + * Statistics counters.
> > + */
> > +static atomic_long_t counters[kcsan_counter_count];
> > +
> > +/*
> > + * Addresses for filtering functions from reporting. This list can be used as a
> > + * whitelist or blacklist.
> > + */
> > +static struct {
> > +       unsigned long *addrs; /* array of addresses */
> > +       size_t size; /* current size */
> > +       int used; /* number of elements used */
> > +       bool sorted; /* if elements are sorted */
> > +       bool whitelist; /* if list is a blacklist or whitelist */
> > +} report_filterlist = {
> > +       .addrs = NULL,
> > +       .size = 8, /* small initial size */
> > +       .used = 0,
> > +       .sorted = false,
> > +       .whitelist = false, /* default is blacklist */
> > +};
> > +static DEFINE_SPINLOCK(report_filterlist_lock);
> > +
> > +static const char *counter_to_name(enum kcsan_counter_id id)
> > +{
> > +       switch (id) {
> > +       case kcsan_counter_used_watchpoints:
> > +               return "used_watchpoints";
> > +       case kcsan_counter_setup_watchpoints:
> > +               return "setup_watchpoints";
> > +       case kcsan_counter_data_races:
> > +               return "data_races";
> > +       case kcsan_counter_no_capacity:
> > +               return "no_capacity";
> > +       case kcsan_counter_report_races:
> > +               return "report_races";
> > +       case kcsan_counter_races_unknown_origin:
> > +               return "races_unknown_origin";
> > +       case kcsan_counter_unencodable_accesses:
> > +               return "unencodable_accesses";
> > +       case kcsan_counter_encoding_false_positives:
> > +               return "encoding_false_positives";
> > +       case kcsan_counter_count:
> > +               BUG();
> > +       }
> > +       return NULL;
> > +}
> > +
> > +void kcsan_counter_inc(enum kcsan_counter_id id)
> > +{
> > +       atomic_long_inc(&counters[id]);
> > +}
> > +
> > +void kcsan_counter_dec(enum kcsan_counter_id id)
> > +{
> > +       atomic_long_dec(&counters[id]);
> > +}
> > +
> > +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> > +{
> > +       const unsigned long a = *(const unsigned long *)rhs;
> > +       const unsigned long b = *(const unsigned long *)lhs;
> > +
> > +       return a < b ? -1 : a == b ? 0 : 1;
> > +}
> > +
> > +bool kcsan_skip_report(unsigned long func_addr)
> > +{
> > +       unsigned long symbolsize, offset;
> > +       unsigned long flags;
> > +       bool ret = false;
> > +
> > +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> > +               return false;
> > +       func_addr -= offset; /* get function start */
> > +
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +       if (report_filterlist.used == 0)
> > +               goto out;
> > +
> > +       /* Sort array if it is unsorted, and then do a binary search. */
> > +       if (!report_filterlist.sorted) {
> > +               sort(report_filterlist.addrs, report_filterlist.used,
> > +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> > +               report_filterlist.sorted = true;
> > +       }
> > +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> > +                       report_filterlist.used, sizeof(unsigned long),
> > +                       cmp_filterlist_addrs);
> > +       if (report_filterlist.whitelist)
> > +               ret = !ret;
> > +
> > +out:
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +       return ret;
> > +}
> > +
> > +static void set_report_filterlist_whitelist(bool whitelist)
> > +{
> > +       unsigned long flags;
> > +
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +       report_filterlist.whitelist = whitelist;
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +}
> > +
> > +static void insert_report_filterlist(const char *func)
> > +{
> > +       unsigned long flags;
> > +       unsigned long addr = kallsyms_lookup_name(func);
> > +
> > +       if (!addr) {
> > +               pr_err("KCSAN: could not find function: '%s'\n", func);
> > +               return;
> > +       }
> > +
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +
> > +       if (report_filterlist.addrs == NULL)
> > +               report_filterlist.addrs = /* initial allocation */
> > +                       kvmalloc_array(report_filterlist.size,
> > +                                      sizeof(unsigned long), GFP_KERNEL);
> You need to use braces in both branches here:
> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#placing-braces-and-spaces

Done @ v3.

> > +       else if (report_filterlist.used == report_filterlist.size) {
> > +               /* resize filterlist */
> > +               unsigned long *new_addrs;
> > +
> > +               report_filterlist.size *= 2;
> > +               new_addrs = kvmalloc_array(report_filterlist.size,
> > +                                          sizeof(unsigned long), GFP_KERNEL);
> > +               memcpy(new_addrs, report_filterlist.addrs,
> > +                      report_filterlist.used * sizeof(unsigned long));
> > +               kvfree(report_filterlist.addrs);
> > +               report_filterlist.addrs = new_addrs;
> > +       }
> > +
> > +       /* Note: deduplicating should be done in userspace. */
> > +       report_filterlist.addrs[report_filterlist.used++] =
> > +               kallsyms_lookup_name(func);
> > +       report_filterlist.sorted = false;
> > +
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +}
> > +
> > +static int show_info(struct seq_file *file, void *v)
> > +{
> > +       int i;
> > +       unsigned long flags;
> > +
> > +       /* show stats */
> > +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> > +       for (i = 0; i < kcsan_counter_count; ++i)
> > +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> > +                          atomic_long_read(&counters[i]));
> > +
> > +       /* show filter functions, and filter type */
> > +       spin_lock_irqsave(&report_filterlist_lock, flags);
> > +       seq_printf(file, "\n%s functions: %s\n",
> > +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> > +                  report_filterlist.used == 0 ? "none" : "");
> > +       for (i = 0; i < report_filterlist.used; ++i)
> > +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> > +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> > +
> > +       return 0;
> > +}
> > +
> > +static int debugfs_open(struct inode *inode, struct file *file)
> > +{
> > +       return single_open(file, show_info, NULL);
> > +}
> > +
> > +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> > +                            size_t count, loff_t *off)
> > +{
> > +       char kbuf[KSYM_NAME_LEN];
> > +       char *arg;
> > +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> > +
> > +       if (copy_from_user(kbuf, buf, read_len))
> > +               return -EINVAL;
> > +       kbuf[read_len] = '\0';
> > +       arg = strstrip(kbuf);
> > +
> > +       if (!strncmp(arg, "on", sizeof("on") - 1))
> > +               WRITE_ONCE(kcsan_enabled, true);
> > +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> > +               WRITE_ONCE(kcsan_enabled, false);
> > +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> > +               set_report_filterlist_whitelist(true);
> > +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> > +               set_report_filterlist_whitelist(false);
> > +       else if (arg[0] == '!')
> > +               insert_report_filterlist(&arg[1]);
> > +       else
> > +               return -EINVAL;
> > +
> > +       return count;
> > +}
> > +
> > +static const struct file_operations debugfs_ops = { .read = seq_read,
> > +                                                   .open = debugfs_open,
> > +                                                   .write = debugfs_write,
> > +                                                   .release = single_release };
> > +
> > +void __init kcsan_debugfs_init(void)
> > +{
> > +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> > +}
> > diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> > new file mode 100644
> > index 000000000000..8f9b1ce0e59f
> > --- /dev/null
> > +++ b/kernel/kcsan/encoding.h
> > @@ -0,0 +1,94 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _MM_KCSAN_ENCODING_H
> > +#define _MM_KCSAN_ENCODING_H
> > +
> > +#include <linux/bits.h>
> > +#include <linux/log2.h>
> > +#include <linux/mm.h>
> > +
> > +#include "kcsan.h"
> > +
> > +#define SLOT_RANGE PAGE_SIZE
> > +#define INVALID_WATCHPOINT 0
> > +#define CONSUMED_WATCHPOINT 1
> > +
> > +/*
> > + * The maximum useful size of accesses for which we set up watchpoints is the
> > + * max range of slots we check on an access.
> > + */
> > +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> > +
> > +/*
> > + * Number of bits we use to store size info.
> > + */
> > +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> > +/*
> > + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> > + * however, most 64-bit architectures do not use the full 64-bit address space.
> > + * Also, in order for a false positive to be observable 2 things need to happen:
> > + *
> > + *     1. different addresses but with the same encoded address race;
> > + *     2. and both map onto the same watchpoint slots;
> > + *
> > + * Both these are assumed to be very unlikely. However, in case it still happens
> > + * happens, the report logic will filter out the false positive (see report.c).
> > + */
> > +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> > +
> > +/*
> > + * Masks to set/retrieve the encoded data.
> > + */
> > +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> > +#define WATCHPOINT_SIZE_MASK                                                   \
> > +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> > +#define WATCHPOINT_ADDR_MASK                                                   \
> > +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> > +
> > +static inline bool check_encodable(unsigned long addr, size_t size)
> > +{
> > +       return size <= MAX_ENCODABLE_SIZE;
> > +}
> > +
> > +static inline long encode_watchpoint(unsigned long addr, size_t size,
> > +                                    bool is_write)
> > +{
> > +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> > +                     (size << WATCHPOINT_ADDR_BITS) |
> > +                     (addr & WATCHPOINT_ADDR_MASK));
> > +}
> > +
> > +static inline bool decode_watchpoint(long watchpoint,
> > +                                    unsigned long *addr_masked, size_t *size,
> > +                                    bool *is_write)
> > +{
> > +       if (watchpoint == INVALID_WATCHPOINT ||
> > +           watchpoint == CONSUMED_WATCHPOINT)
> > +               return false;
> > +
> > +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> > +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> > +               WATCHPOINT_ADDR_BITS;
> > +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> > +
> > +       return true;
> > +}
> > +
> > +/*
> > + * Return watchpoint slot for an address.
> > + */
> > +static inline int watchpoint_slot(unsigned long addr)
> > +{
> > +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> > +}
> > +
> > +static inline bool matching_access(unsigned long addr1, size_t size1,
> > +                                  unsigned long addr2, size_t size2)
> > +{
> > +       unsigned long end_range1 = addr1 + size1 - 1;
> > +       unsigned long end_range2 = addr2 + size2 - 1;
> > +
> > +       return addr1 <= end_range2 && addr2 <= end_range1;
> > +}
> > +
> > +#endif /* _MM_KCSAN_ENCODING_H */
> > diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> > new file mode 100644
> > index 000000000000..45cf2fffd8a0
> > --- /dev/null
> > +++ b/kernel/kcsan/kcsan.c
> > @@ -0,0 +1,86 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +/*
> > + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> > + * see Documentation/dev-tools/kcsan.rst.
> > + */
> > +
> > +#include <linux/export.h>
> > +
> > +#include "kcsan.h"
> > +
> > +/*
> > + * KCSAN uses the same instrumentation that is emitted by supported compilers
> > + * for Thread Sanitizer (TSAN).
> > + *
> > + * When enabled, the compiler emits instrumentation calls (the functions
> > + * prefixed with "__tsan" below) for all loads and stores that it generated;
> > + * inline asm is not instrumented.
> > + */
> > +
> > +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> > +       void __tsan_read##size(void *ptr)                                      \
> > +       {                                                                      \
> > +               __kcsan_check_read(ptr, size);                                 \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> > +       void __tsan_write##size(void *ptr)                                     \
> > +       {                                                                      \
> > +               __kcsan_check_write(ptr, size);                                \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_write##size)
> > +
> > +DEFINE_TSAN_READ_WRITE(1);
> > +DEFINE_TSAN_READ_WRITE(2);
> > +DEFINE_TSAN_READ_WRITE(4);
> > +DEFINE_TSAN_READ_WRITE(8);
> > +DEFINE_TSAN_READ_WRITE(16);
> > +
> > +/*
> > + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> > + * but e.g. recent versions of Clang do.
> > + */
> > +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> > +       void __tsan_unaligned_read##size(void *ptr)                            \
> > +       {                                                                      \
> > +               __kcsan_check_read(ptr, size);                                 \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> > +       void __tsan_unaligned_write##size(void *ptr)                           \
> > +       {                                                                      \
> > +               __kcsan_check_write(ptr, size);                                \
> > +       }                                                                      \
> > +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> > +
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> > +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> > +
> > +void __tsan_read_range(void *ptr, size_t size)
> > +{
> > +       __kcsan_check_read(ptr, size);
> > +}
> > +EXPORT_SYMBOL(__tsan_read_range);
> > +
> > +void __tsan_write_range(void *ptr, size_t size)
> > +{
> > +       __kcsan_check_write(ptr, size);
> > +}
> > +EXPORT_SYMBOL(__tsan_write_range);
> > +
> > +/*
> > + * The below are not required KCSAN, but can still be emitted by the compiler.
> > + */
> > +void __tsan_func_entry(void *call_pc)
> > +{
> > +}
> > +EXPORT_SYMBOL(__tsan_func_entry);
> > +void __tsan_func_exit(void)
> > +{
> > +}
> > +EXPORT_SYMBOL(__tsan_func_exit);
> > +void __tsan_init(void)
> > +{
> > +}
> > +EXPORT_SYMBOL(__tsan_init);
> > diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> > new file mode 100644
> > index 000000000000..429479b3041d
> > --- /dev/null
> > +++ b/kernel/kcsan/kcsan.h
> > @@ -0,0 +1,140 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _MM_KCSAN_KCSAN_H
> > +#define _MM_KCSAN_KCSAN_H
> > +
> > +#include <linux/kcsan.h>
> > +
> > +/*
> > + * Total number of watchpoints. An address range maps into a specific slot as
> > + * specified in `encoding.h`. Although larger number of watchpoints may not even
> > + * be usable due to limited thread count, a larger value will improve
> > + * performance due to reducing cache-line contention.
> > + */
> > +#define KCSAN_NUM_WATCHPOINTS 64
> > +
> > +/*
> > + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> > + *
> > + *     1. the address slot is already occupied, check if any adjacent slots are
> > + *        free;
> > + *     2. accesses that straddle a slot boundary due to size that exceeds a
> > + *        slot's range may check adjacent slots if any watchpoint matches.
> > + *
> > + * Note that accesses with very large size may still miss a watchpoint; however,
> > + * given this should be rare, this is a reasonable trade-off to make, since this
> > + * will avoid:
> > + *
> > + *     1. excessive contention between watchpoint checks and setup;
> > + *     2. larger number of simultaneous watchpoints without sacrificing
> > + *        performance.
> > + */
> > +#define KCSAN_CHECK_ADJACENT 1
> > +
> > +/*
> > + * Globally enable and disable KCSAN.
> > + */
> > +extern bool kcsan_enabled;
> > +
> > +/*
> > + * Helper that returns true if access to ptr should be considered as an atomic
> > + * access, even though it is not explicitly atomic.
> > + */
> > +bool kcsan_is_atomic(const volatile void *ptr);
> > +
> > +/*
> > + * Initialize debugfs file.
> > + */
> > +void kcsan_debugfs_init(void);
> > +
> > +enum kcsan_counter_id {
> Labels in enums should be capitalized:
> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#macros-enums-and-rtl

Done @ v3.

> > +       /*
> > +        * Number of watchpoints currently in use.
> > +        */
> > +       kcsan_counter_used_watchpoints,
> > +
> > +       /*
> > +        * Total number of watchpoints set up.
> > +        */
> > +       kcsan_counter_setup_watchpoints,
> > +
> > +       /*
> > +        * Total number of data-races.
> > +        */
> > +       kcsan_counter_data_races,
> > +
> > +       /*
> > +        * Number of times no watchpoints were available.
> > +        */
> > +       kcsan_counter_no_capacity,
> > +
> > +       /*
> > +        * A thread checking a watchpoint raced with another checking thread;
> > +        * only one will be reported.
> > +        */
> > +       kcsan_counter_report_races,
> > +
> > +       /*
> > +        * Observed data value change, but writer thread unknown.
> > +        */
> > +       kcsan_counter_races_unknown_origin,
> > +
> > +       /*
> > +        * The access cannot be encoded to a valid watchpoint.
> > +        */
> > +       kcsan_counter_unencodable_accesses,
> > +
> > +       /*
> > +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> > +        * accesses.
> > +        */
> > +       kcsan_counter_encoding_false_positives,
> > +
> > +       kcsan_counter_count, /* number of counters */
> > +};
> > +
> > +/*
> > + * Increment/decrement counter with given id; avoid calling these in fast-path.
> > + */
> > +void kcsan_counter_inc(enum kcsan_counter_id id);
> > +void kcsan_counter_dec(enum kcsan_counter_id id);
> > +
> > +/*
> > + * Returns true if data-races in the function symbol that maps to addr (offsets
> > + * are ignored) should *not* be reported.
> > + */
> > +bool kcsan_skip_report(unsigned long func_addr);
> > +
> > +enum kcsan_report_type {
> > +       /*
> > +        * The thread that set up the watchpoint and briefly stalled was
> > +        * signalled that another thread triggered the watchpoint, and thus a
> > +        * race was encountered.
> > +        */
> > +       kcsan_report_race_setup,
> > +
> > +       /*
> > +        * A thread encountered a watchpoint for the access, therefore a race
> > +        * was encountered.
> > +        */
> > +       kcsan_report_race_check,
> > +
> > +       /*
> > +        * A thread encountered a watchpoint for the access, but the other
> > +        * racing thread can no longer be signaled that a race occurred.
> > +        */
> > +       kcsan_report_race_check_race,
> > +
> > +       /*
> > +        * No other thread was observed to race with the access, but the data
> > +        * value before and after the stall differs.
> > +        */
> > +       kcsan_report_race_unknown_origin,
> > +};
> > +/*
> > + * Print a race report from thread that encountered the race.
> > + */
> > +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> > +                 int cpu_id, enum kcsan_report_type type);
> > +
> > +#endif /* _MM_KCSAN_KCSAN_H */
> > diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> > new file mode 100644
> > index 000000000000..517db539e4e7
> > --- /dev/null
> > +++ b/kernel/kcsan/report.c
> > @@ -0,0 +1,306 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/preempt.h>
> > +#include <linux/printk.h>
> > +#include <linux/sched.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/stacktrace.h>
> > +
> > +#include "kcsan.h"
> > +#include "encoding.h"
> > +
> > +/*
> > + * Max. number of stack entries to show in the report.
> > + */
> > +#define NUM_STACK_ENTRIES 16
> > +
> > +/*
> > + * Other thread info: communicated from other racing thread to thread that set
> > + * up the watchpoint, which then prints the complete report atomically. Only
> > + * need one struct, as all threads should to be serialized regardless to print
> > + * the reports, with reporting being in the slow-path.
> > + */
> > +static struct {
> > +       const volatile void *ptr;
> > +       size_t size;
> > +       bool is_write;
> > +       int task_pid;
> > +       int cpu_id;
> > +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> > +       int num_stack_entries;
> > +} other_info = { .ptr = NULL };
> > +
> > +static DEFINE_SPINLOCK(other_info_lock);
> > +static DEFINE_SPINLOCK(report_lock);
> > +
> > +static bool set_or_lock_other_info(unsigned long *flags,
> > +                                  const volatile void *ptr, size_t size,
> > +                                  bool is_write, int cpu_id,
> > +                                  enum kcsan_report_type type)
> > +{
> > +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> > +               return true;
> > +
> > +       for (;;) {
> > +               spin_lock_irqsave(&other_info_lock, *flags);
> > +
> > +               switch (type) {
> > +               case kcsan_report_race_check:
> > +                       if (other_info.ptr != NULL) {
> > +                               /* still in use, retry */
> > +                               break;
> > +                       }
> > +                       other_info.ptr = ptr;
> > +                       other_info.size = size;
> > +                       other_info.is_write = is_write;
> > +                       other_info.task_pid =
> > +                               in_task() ? task_pid_nr(current) : -1;
> > +                       other_info.cpu_id = cpu_id;
> > +                       other_info.num_stack_entries = stack_trace_save(
> > +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> > +                       /*
> > +                        * other_info may now be consumed by thread we raced
> > +                        * with.
> > +                        */
> > +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> > +                       return false;
> > +
> > +               case kcsan_report_race_setup:
> > +                       if (other_info.ptr == NULL)
> > +                               break; /* no data available yet, retry */
> > +
> > +                       /*
> > +                        * First check if matching based on how watchpoint was
> > +                        * encoded.
> > +                        */
> > +                       if (!matching_access((unsigned long)other_info.ptr &
> > +                                                    WATCHPOINT_ADDR_MASK,
> > +                                            other_info.size,
> > +                                            (unsigned long)ptr &
> > +                                                    WATCHPOINT_ADDR_MASK,
> > +                                            size))
> > +                               break; /* mismatching access, retry */
> > +
> > +                       if (!matching_access((unsigned long)other_info.ptr,
> > +                                            other_info.size,
> > +                                            (unsigned long)ptr, size)) {
> > +                               /*
> > +                                * If the actual accesses to not match, this was
> > +                                * a false positive due to watchpoint encoding.
> > +                                */
> > +                               other_info.ptr = NULL; /* mark for reuse */
> > +                               kcsan_counter_inc(
> > +                                       kcsan_counter_encoding_false_positives);
> > +                               spin_unlock_irqrestore(&other_info_lock,
> > +                                                      *flags);
> > +                               return false;
> > +                       }
> > +
> > +                       /*
> > +                        * Matching access: keep other_info locked, as this
> > +                        * thread uses it to print the full report; unlocked in
> > +                        * end_report.
> > +                        */
> > +                       return true;
> > +
> > +               default:
> > +                       BUG();
> > +               }
> > +
> > +               spin_unlock_irqrestore(&other_info_lock, *flags);
> > +       }
> > +}
> > +
> > +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> > +{
> > +       switch (type) {
> > +       case kcsan_report_race_setup:
> > +               /* irqsaved already via other_info_lock */
> > +               spin_lock(&report_lock);
> > +               break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               spin_lock_irqsave(&report_lock, *flags);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +}
> > +
> > +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> > +{
> > +       switch (type) {
> > +       case kcsan_report_race_setup:
> > +               other_info.ptr = NULL; /* mark for reuse */
> > +               spin_unlock(&report_lock);
> > +               spin_unlock_irqrestore(&other_info_lock, *flags);
> > +               break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               spin_unlock_irqrestore(&report_lock, *flags);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +}
> > +
> > +static const char *get_access_type(bool is_write)
> > +{
> > +       return is_write ? "write" : "read";
> > +}
> > +
> > +/* Return thread description: in task or interrupt. */
> > +static const char *get_thread_desc(int task_id)
> > +{
> > +       if (task_id != -1) {
> > +               static char buf[32]; /* safe: protected by report_lock */
> > +
> > +               snprintf(buf, sizeof(buf), "task %i", task_id);
> > +               return buf;
> > +       }
> > +       return in_nmi() ? "NMI" : "interrupt";
> > +}
> > +
> > +/* Helper to skip KCSAN-related functions in stack-trace. */
> > +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> > +{
> > +       char buf[64];
> > +       int skip = 0;
> > +
> > +       for (; skip < num_entries; ++skip) {
> > +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> > +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> > +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> > +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> > +                       break;
> > +               }
> > +       }
> > +       return skip;
> > +}
> FWIW another option is to put all KCSAN-related functions in a
> separate code section and check if the function addresses are in the
> address range belonging to that section.
> This will work even with non-symbolized stacks.

Thanks for the suggestion. Is it worth it, i.e. will it simplify the
design and code? If it simplifies the design (or makes the fast-path
significantly faster), then yes, but otherwise I prefer the simplest
possible solution here. AFAIK, it will not make it simpler nor faster.
Using non-symbolized stacks should not be the common use-case (how to
usefully debug any data-race?).

> > +/* Compares symbolized strings of addr1 and addr2. */
> > +static int sym_strcmp(void *addr1, void *addr2)
> > +{
> > +       char buf1[64];
> > +       char buf2[64];
> > +
> > +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> > +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> > +       return strncmp(buf1, buf2, sizeof(buf1));
> > +}
> > +
> > +/*
> > + * Returns true if a report was generated, false otherwise.
> > + */
> > +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> > +                         int cpu_id, enum kcsan_report_type type)
> > +{
> > +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> > +       int num_stack_entries =
> > +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> > +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> > +       int other_skipnr;
> > +
> > +       /* Check if the top stackframe is in a blacklisted function. */
> > +       if (kcsan_skip_report(stack_entries[skipnr]))
> > +               return false;
> > +       if (type == kcsan_report_race_setup) {
> > +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> > +                                               other_info.num_stack_entries);
> > +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> > +                       return false;
> > +       }
> > +
> > +       /* Print report header. */
> > +       pr_err("==================================================================\n");
> > +       switch (type) {
> > +       case kcsan_report_race_setup: {
> > +               void *this_fn = (void *)stack_entries[skipnr];
> > +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> > +               int cmp;
> > +
> > +               /*
> > +                * Order functions lexographically for consistent bug titles.
> > +                * Do not print offset of functions to keep title short.
> > +                */
> > +               cmp = sym_strcmp(other_fn, this_fn);
> > +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> > +                      cmp < 0 ? other_fn : this_fn,
> > +                      cmp < 0 ? this_fn : other_fn);
> > +       } break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               pr_err("BUG: KCSAN: data-race in %pS\n",
> > +                      (void *)stack_entries[skipnr]);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +
> > +       pr_err("\n");
> > +
> > +       /* Print information about the racing accesses. */
> > +       switch (type) {
> > +       case kcsan_report_race_setup:
> > +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> > +                      get_access_type(other_info.is_write), other_info.ptr,
> > +                      other_info.size, get_thread_desc(other_info.task_pid),
> > +                      other_info.cpu_id);
> > +
> > +               /* Print the other thread's stack trace. */
> > +               stack_trace_print(other_info.stack_entries + other_skipnr,
> > +                                 other_info.num_stack_entries - other_skipnr,
> > +                                 0);
> > +
> > +               pr_err("\n");
> > +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> > +                      get_access_type(is_write), ptr, size,
> > +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> > +                      cpu_id);
> > +               break;
> > +
> > +       case kcsan_report_race_unknown_origin:
> > +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> > +                      get_access_type(is_write), ptr, size,
> > +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> > +                      cpu_id);
> > +               break;
> > +
> > +       default:
> > +               BUG();
> > +       }
> > +       /* Print stack trace of this thread. */
> > +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> > +                         0);
> > +
> > +       /* Print report footer. */
> > +       pr_err("\n");
> > +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> > +       dump_stack_print_info(KERN_DEFAULT);
> > +       pr_err("==================================================================\n");
> > +
> > +       return true;
> > +}
> > +
> > +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> > +                 int cpu_id, enum kcsan_report_type type)
> > +{
> > +       unsigned long flags = 0;
> > +
> > +       if (type == kcsan_report_race_check_race)
> > +               return;
> > +
> > +       kcsan_disable_current();
> > +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> > +               start_report(&flags, type);
> > +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> > +                   panic_on_warn)
> > +                       panic("panic_on_warn set ...\n");
> > +
> > +               end_report(&flags, type);
> > +       }
> > +       kcsan_enable_current();
> > +}
> > diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> > new file mode 100644
> > index 000000000000..68c896a24529
> > --- /dev/null
> > +++ b/kernel/kcsan/test.c
> > @@ -0,0 +1,117 @@
> > +// SPDX-License-Identifier: GPL-2.0
> IIRC checkpatch.pl requires all SPDX headers to look like this one
> (C++-style, not C-style).
> Please double check and fix the headers in other files if necessary.

Checkpatch is happy. // for .c, and /**/ for .h.

> This file might also use some comments, now it's not easy to
> understand what it's testing.

Done @ v3.

> > +
> > +#include <linux/init.h>
> > +#include <linux/kernel.h>
> > +#include <linux/printk.h>
> > +#include <linux/random.h>
> > +#include <linux/types.h>
> > +
> > +#include "encoding.h"
> > +
> > +#define ITERS_PER_TEST 2000
> > +
> > +/* Test requirements. */
> > +static bool test_requires(void)
> > +{
> > +       /* random should be initialized */
> > +       return prandom_u32() + prandom_u32() != 0;
> > +}
> > +
> > +/* Test watchpoint encode and decode. */
> > +static bool test_encode_decode(void)
> > +{
> > +       int i;
> > +
> > +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> > +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> > +               bool is_write = prandom_u32() % 2;
> > +               unsigned long addr;
> > +
> > +               prandom_bytes(&addr, sizeof(addr));
> > +               if (WARN_ON(!check_encodable(addr, size)))
> > +                       return false;
> > +
> > +               /* encode and decode */
> > +               {
> > +                       const long encoded_watchpoint =
> > +                               encode_watchpoint(addr, size, is_write);
> > +                       unsigned long verif_masked_addr;
> > +                       size_t verif_size;
> > +                       bool verif_is_write;
> > +
> > +                       /* check special watchpoints */
> > +                       if (WARN_ON(decode_watchpoint(
> > +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> > +                                   &verif_size, &verif_is_write)))
> > +                               return false;
> > +                       if (WARN_ON(decode_watchpoint(
> > +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> > +                                   &verif_size, &verif_is_write)))
> > +                               return false;
> > +
> > +                       /* check decoding watchpoint returns same data */
> > +                       if (WARN_ON(!decode_watchpoint(
> > +                                   encoded_watchpoint, &verif_masked_addr,
> > +                                   &verif_size, &verif_is_write)))
> > +                               return false;
> > +                       if (WARN_ON(verif_masked_addr !=
> > +                                   (addr & WATCHPOINT_ADDR_MASK)))
> > +                               goto fail;
> > +                       if (WARN_ON(verif_size != size))
> > +                               goto fail;
> > +                       if (WARN_ON(is_write != verif_is_write))
> > +                               goto fail;
> > +
> > +                       continue;
> > +fail:
> > +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> > +                              __func__, is_write ? "write" : "read", size,
> > +                              addr, encoded_watchpoint,
> > +                              verif_is_write ? "write" : "read", verif_size,
> > +                              verif_masked_addr);
> > +                       return false;
> > +               }
> > +       }
> > +
> > +       return true;
> > +}
> > +
> > +static bool test_matching_access(void)
> > +{
> > +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> > +               return false;
> > +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> > +               return false;
> > +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> > +               return false;
> > +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> > +               return false;
> > +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> > +               return false;
> > +       return true;
> > +}
> > +
> > +static int __init kcsan_selftest(void)
> > +{
> > +       int passed = 0;
> > +       int total = 0;
> > +
> > +#define RUN_TEST(do_test)                                                      \
> > +       do {                                                                   \
> > +               ++total;                                                       \
> > +               if (do_test())                                                 \
> > +                       ++passed;                                              \
> > +               else                                                           \
> > +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> > +       } while (0)
> > +
> > +       RUN_TEST(test_requires);
> > +       RUN_TEST(test_encode_decode);
> > +       RUN_TEST(test_matching_access);
> > +
> > +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> > +       if (passed != total)
> > +               panic("KCSAN selftests failed");
> > +       return 0;
> > +}
> > +postcore_initcall(kcsan_selftest);
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 93d97f9b0157..35accd1d93de 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
> >
> >  source "lib/Kconfig.ubsan"
> >
> > +source "lib/Kconfig.kcsan"
> > +
> >  config ARCH_HAS_DEVMEM_IS_ALLOWED
> >         bool
> >
> > diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> > new file mode 100644
> > index 000000000000..3e1f1acfb24b
> > --- /dev/null
> > +++ b/lib/Kconfig.kcsan
> > @@ -0,0 +1,88 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +config HAVE_ARCH_KCSAN
> > +       bool
> > +
> > +menuconfig KCSAN
> > +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> > +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> > +       default n
> > +       help
> > +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> > +         uses a watchpoint-based sampling approach to detect races.
> > +
> > +if KCSAN
> > +
> > +config KCSAN_SELFTEST
> > +       bool "KCSAN: perform short selftests on boot"
> > +       default y
> > +       help
> > +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> > +
> > +config KCSAN_EARLY_ENABLE
> > +       bool "KCSAN: early enable"
> > +       default y
> > +       help
> > +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> > +         later be enabled/disabled via debugfs.
> > +
> > +config KCSAN_UDELAY_MAX_TASK
> > +       int "KCSAN: maximum delay in microseconds (for tasks)"
> > +       default 80
> > +       help
> > +         For tasks, the max. microsecond delay after setting up a watchpoint.
> > +
> > +config KCSAN_UDELAY_MAX_INTERRUPT
> > +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> > +       default 20
> > +       help
> > +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> > +
> > +config KCSAN_DELAY_RANDOMIZE
> > +       bool "KCSAN: randomize delays"
> > +       default y
> > +       help
> > +         If delays should be randomized; if false, the chosen delay is simply
> > +         the maximum values defined above.
> > +
> > +config KCSAN_WATCH_SKIP_INST
> > +       int "KCSAN: watchpoint instruction skip"
> > +       default 2000
> > +       help
> > +         The number of per-CPU memory operations to skip watching, before
> > +         another watchpoint is set up; in other words, 1 in
> > +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> > +         watchpoint. A smaller value results in more aggressive race
> > +         detection, whereas a larger value improves system performance at the
> > +         cost of missing some races.
> > +
> > +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> > +       bool "KCSAN: report races of unknown origin"
> > +       default y
> > +       help
> > +         If KCSAN should report races where only one access is known, and the
> > +         conflicting access is of unknown origin. This type of race is
> > +         reported if it was only possible to infer a race due to a data-value
> > +         change while an access is being delayed on a watchpoint.
> > +
> > +config KCSAN_IGNORE_ATOMICS
> > +       bool "KCSAN: do not instrument marked atomic accesses"
> > +       default n
> > +       help
> > +         If enabled, never instruments marked atomic accesses. This results in
> > +         not reporting data-races where one access is atomic and the other is
> > +         a plain access.
> > +
> Isn't it better to decide at runtime, whether we want to ignore atomics or not?

See below.

> > +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> > +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> > +       default n
> > +       help
> > +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> > +         This option should only be used to prune initial data-races found in
> > +         existing code.
> Overall, I think it's better to make most of these configs boot-time flags.
> This way one won't need to rebuild the kernel every time they want to
> turn some option on or off.

From a design point of view, this complicates things on several
fronts. For one I would prefer having config options in one place,
however, most of these were added to "tame" syzbot, and keep reporting
volume initially low. I do not expect these to be switched frequently,
and for simplicity sake and to optimize for the common use-case, it'll
be better to keep it as-is. Eventually, these might even go away
completely.

I will add a comment to that effect above these options for v3.

> > +config KCSAN_DEBUG
> > +       bool "Debugging of KCSAN internals"
> > +       default n
> > +
> > +endif # KCSAN
> > diff --git a/lib/Makefile b/lib/Makefile
> > index c5892807e06f..778ab704e3ad 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
> >  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
> >  endif
> >
> > +# Used by KCSAN while enabled, avoid recursion.
> > +KCSAN_SANITIZE_random32.o := n
> > +
> >  lib-y := ctype.o string.o vsprintf.o cmdline.o \
> >          rbtree.o radix-tree.o timerqueue.o xarray.o \
> >          idr.o extable.o \
> > diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> > new file mode 100644
> > index 000000000000..caf1111a28ae
> > --- /dev/null
> > +++ b/scripts/Makefile.kcsan
> > @@ -0,0 +1,6 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +ifdef CONFIG_KCSAN
> > +
> > +CFLAGS_KCSAN := -fsanitize=thread
> > +
> > +endif # CONFIG_KCSAN
> > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> > index 179d55af5852..0e78abab7d83 100644
> > --- a/scripts/Makefile.lib
> > +++ b/scripts/Makefile.lib
> > @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
> >         $(CFLAGS_KCOV))
> >  endif
> >
> > +#
> > +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> "KernelConcurrencySanitizer" or "Kernel Concurrency Sanitizer", maybe?

Done @ v3.

> > +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> > +#
> > +ifeq ($(CONFIG_KCSAN),y)
> > +_c_flags += $(if $(patsubst n%,, \
> > +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> > +       $(CFLAGS_KCSAN))
> > +endif
> > +
> >  # $(srctree)/$(src) for including checkin headers from generated source files
> >  # $(objtree)/$(obj) for including generated headers from checkin source files
> >  ifeq ($(KBUILD_EXTMOD),)
> > --
> > 2.23.0.866.gb869b98d4c-goog
> >

Thanks for your comments!
-- Marco


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 7/8] locking/atomics, kcsan: Add KCSAN instrumentation
  2019-10-17 14:13   ` Marco Elver
  (?)
@ 2019-10-22 12:33   ` Mark Rutland
  2019-10-22 18:17       ` Marco Elver
  -1 siblings, 1 reply; 88+ messages in thread
From: Mark Rutland @ 2019-10-22 12:33 UTC (permalink / raw)
  To: Marco Elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, npiggin, paulmck, peterz, tglx, will, kasan-dev,
	linux-arch, linux-doc, linux-efi, linux-kbuild, linux-kernel,
	linux-mm, x86

On Thu, Oct 17, 2019 at 04:13:04PM +0200, Marco Elver wrote:
> This adds KCSAN instrumentation to atomic-instrumented.h.
> 
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Use kcsan_check{,_atomic}_{read,write} instead of
>   kcsan_check_{access,atomic}.
> * Introduce __atomic_check_{read,write} [Suggested by Mark Rutland].
> ---
>  include/asm-generic/atomic-instrumented.h | 393 +++++++++++-----------
>  scripts/atomic/gen-atomic-instrumented.sh |  17 +-
>  2 files changed, 218 insertions(+), 192 deletions(-)

The script changes and generated code look fine to me, so FWIW:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
> index e09812372b17..8b8b2a6f8d68 100755
> --- a/scripts/atomic/gen-atomic-instrumented.sh
> +++ b/scripts/atomic/gen-atomic-instrumented.sh
> @@ -20,7 +20,7 @@ gen_param_check()
>  	# We don't write to constant parameters
>  	[ ${type#c} != ${type} ] && rw="read"
>  
> -	printf "\tkasan_check_${rw}(${name}, sizeof(*${name}));\n"
> +	printf "\t__atomic_check_${rw}(${name}, sizeof(*${name}));\n"
>  }
>  
>  #gen_param_check(arg...)
> @@ -107,7 +107,7 @@ cat <<EOF
>  #define ${xchg}(ptr, ...)						\\
>  ({									\\
>  	typeof(ptr) __ai_ptr = (ptr);					\\
> -	kasan_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr));		\\
> +	__atomic_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr));		\\
>  	arch_${xchg}(__ai_ptr, __VA_ARGS__);				\\
>  })
>  EOF
> @@ -148,6 +148,19 @@ cat << EOF
>  
>  #include <linux/build_bug.h>
>  #include <linux/kasan-checks.h>
> +#include <linux/kcsan-checks.h>
> +
> +static inline void __atomic_check_read(const volatile void *v, size_t size)
> +{
> +	kasan_check_read(v, size);
> +	kcsan_check_atomic_read(v, size);
> +}
> +
> +static inline void __atomic_check_write(const volatile void *v, size_t size)
> +{
> +	kasan_check_write(v, size);
> +	kcsan_check_atomic_write(v, size);
> +}
>  
>  EOF
>  
> -- 
> 2.23.0.866.gb869b98d4c-goog
> 

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 8/8] x86, kcsan: Enable KCSAN for x86
  2019-10-17 14:13   ` Marco Elver
  (?)
@ 2019-10-22 12:59   ` Mark Rutland
  2019-10-22 13:02       ` Marco Elver
  -1 siblings, 1 reply; 88+ messages in thread
From: Mark Rutland @ 2019-10-22 12:59 UTC (permalink / raw)
  To: Marco Elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, npiggin, paulmck, peterz, tglx, will, kasan-dev,
	linux-arch, linux-doc, linux-efi, linux-kbuild, linux-kernel,
	linux-mm, x86

On Thu, Oct 17, 2019 at 04:13:05PM +0200, Marco Elver wrote:
> This patch enables KCSAN for x86, with updates to build rules to not use
> KCSAN for several incompatible compilation units.
> 
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Document build exceptions where no previous above comment explained
>   why we cannot instrument.
> ---
>  arch/x86/Kconfig                      | 1 +
>  arch/x86/boot/Makefile                | 2 ++
>  arch/x86/boot/compressed/Makefile     | 2 ++
>  arch/x86/entry/vdso/Makefile          | 3 +++
>  arch/x86/include/asm/bitops.h         | 6 +++++-
>  arch/x86/kernel/Makefile              | 7 +++++++
>  arch/x86/kernel/cpu/Makefile          | 3 +++
>  arch/x86/lib/Makefile                 | 4 ++++
>  arch/x86/mm/Makefile                  | 3 +++
>  arch/x86/purgatory/Makefile           | 2 ++
>  arch/x86/realmode/Makefile            | 3 +++
>  arch/x86/realmode/rm/Makefile         | 3 +++
>  drivers/firmware/efi/libstub/Makefile | 2 ++
>  13 files changed, 40 insertions(+), 1 deletion(-)

> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 0460c7581220..693d0a94b118 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -31,7 +31,9 @@ KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
>  				   -D__DISABLE_EXPORTS
>  
>  GCOV_PROFILE			:= n
> +# Sanitizer runtimes are unavailable and cannot be linked here.
>  KASAN_SANITIZE			:= n
> +KCSAN_SANITIZE			:= n
>  UBSAN_SANITIZE			:= n
>  OBJECT_FILES_NON_STANDARD	:= y

Not a big deal, but it might make sense to move the EFI stub exception
to patch 3 since it isn't x86 specific (and will also apply for arm64).

Otherwise this looks good to me.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 8/8] x86, kcsan: Enable KCSAN for x86
  2019-10-22 12:59   ` Mark Rutland
  2019-10-22 13:02       ` Marco Elver
@ 2019-10-22 13:02       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-22 13:02 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi, Linux Kbuild mailing list,
	LKML, Linux Memory Management List, the arch/x86 maintainers

On Tue, 22 Oct 2019 at 14:59, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Oct 17, 2019 at 04:13:05PM +0200, Marco Elver wrote:
> > This patch enables KCSAN for x86, with updates to build rules to not use
> > KCSAN for several incompatible compilation units.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v2:
> > * Document build exceptions where no previous above comment explained
> >   why we cannot instrument.
> > ---
> >  arch/x86/Kconfig                      | 1 +
> >  arch/x86/boot/Makefile                | 2 ++
> >  arch/x86/boot/compressed/Makefile     | 2 ++
> >  arch/x86/entry/vdso/Makefile          | 3 +++
> >  arch/x86/include/asm/bitops.h         | 6 +++++-
> >  arch/x86/kernel/Makefile              | 7 +++++++
> >  arch/x86/kernel/cpu/Makefile          | 3 +++
> >  arch/x86/lib/Makefile                 | 4 ++++
> >  arch/x86/mm/Makefile                  | 3 +++
> >  arch/x86/purgatory/Makefile           | 2 ++
> >  arch/x86/realmode/Makefile            | 3 +++
> >  arch/x86/realmode/rm/Makefile         | 3 +++
> >  drivers/firmware/efi/libstub/Makefile | 2 ++
> >  13 files changed, 40 insertions(+), 1 deletion(-)
>
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index 0460c7581220..693d0a94b118 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -31,7 +31,9 @@ KBUILD_CFLAGS                       := $(cflags-y) -DDISABLE_BRANCH_PROFILING \
> >                                  -D__DISABLE_EXPORTS
> >
> >  GCOV_PROFILE                 := n
> > +# Sanitizer runtimes are unavailable and cannot be linked here.
> >  KASAN_SANITIZE                       := n
> > +KCSAN_SANITIZE                       := n
> >  UBSAN_SANITIZE                       := n
> >  OBJECT_FILES_NON_STANDARD    := y
>
> Not a big deal, but it might make sense to move the EFI stub exception
> to patch 3 since it isn't x86 specific (and will also apply for arm64).

Thanks for spotting, moved for v3.

-- Marco

> Otherwise this looks good to me.
>
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 8/8] x86, kcsan: Enable KCSAN for x86
@ 2019-10-22 13:02       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-22 13:02 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet

On Tue, 22 Oct 2019 at 14:59, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Oct 17, 2019 at 04:13:05PM +0200, Marco Elver wrote:
> > This patch enables KCSAN for x86, with updates to build rules to not use
> > KCSAN for several incompatible compilation units.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v2:
> > * Document build exceptions where no previous above comment explained
> >   why we cannot instrument.
> > ---
> >  arch/x86/Kconfig                      | 1 +
> >  arch/x86/boot/Makefile                | 2 ++
> >  arch/x86/boot/compressed/Makefile     | 2 ++
> >  arch/x86/entry/vdso/Makefile          | 3 +++
> >  arch/x86/include/asm/bitops.h         | 6 +++++-
> >  arch/x86/kernel/Makefile              | 7 +++++++
> >  arch/x86/kernel/cpu/Makefile          | 3 +++
> >  arch/x86/lib/Makefile                 | 4 ++++
> >  arch/x86/mm/Makefile                  | 3 +++
> >  arch/x86/purgatory/Makefile           | 2 ++
> >  arch/x86/realmode/Makefile            | 3 +++
> >  arch/x86/realmode/rm/Makefile         | 3 +++
> >  drivers/firmware/efi/libstub/Makefile | 2 ++
> >  13 files changed, 40 insertions(+), 1 deletion(-)
>
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index 0460c7581220..693d0a94b118 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -31,7 +31,9 @@ KBUILD_CFLAGS                       := $(cflags-y) -DDISABLE_BRANCH_PROFILING \
> >                                  -D__DISABLE_EXPORTS
> >
> >  GCOV_PROFILE                 := n
> > +# Sanitizer runtimes are unavailable and cannot be linked here.
> >  KASAN_SANITIZE                       := n
> > +KCSAN_SANITIZE                       := n
> >  UBSAN_SANITIZE                       := n
> >  OBJECT_FILES_NON_STANDARD    := y
>
> Not a big deal, but it might make sense to move the EFI stub exception
> to patch 3 since it isn't x86 specific (and will also apply for arm64).

Thanks for spotting, moved for v3.

-- Marco

> Otherwise this looks good to me.
>
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 8/8] x86, kcsan: Enable KCSAN for x86
@ 2019-10-22 13:02       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-22 13:02 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi, Linux Kbuild mailing list,
	LKML, Linux Memory Management List, the arch/x86 maintainers

On Tue, 22 Oct 2019 at 14:59, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Oct 17, 2019 at 04:13:05PM +0200, Marco Elver wrote:
> > This patch enables KCSAN for x86, with updates to build rules to not use
> > KCSAN for several incompatible compilation units.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v2:
> > * Document build exceptions where no previous above comment explained
> >   why we cannot instrument.
> > ---
> >  arch/x86/Kconfig                      | 1 +
> >  arch/x86/boot/Makefile                | 2 ++
> >  arch/x86/boot/compressed/Makefile     | 2 ++
> >  arch/x86/entry/vdso/Makefile          | 3 +++
> >  arch/x86/include/asm/bitops.h         | 6 +++++-
> >  arch/x86/kernel/Makefile              | 7 +++++++
> >  arch/x86/kernel/cpu/Makefile          | 3 +++
> >  arch/x86/lib/Makefile                 | 4 ++++
> >  arch/x86/mm/Makefile                  | 3 +++
> >  arch/x86/purgatory/Makefile           | 2 ++
> >  arch/x86/realmode/Makefile            | 3 +++
> >  arch/x86/realmode/rm/Makefile         | 3 +++
> >  drivers/firmware/efi/libstub/Makefile | 2 ++
> >  13 files changed, 40 insertions(+), 1 deletion(-)
>
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index 0460c7581220..693d0a94b118 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -31,7 +31,9 @@ KBUILD_CFLAGS                       := $(cflags-y) -DDISABLE_BRANCH_PROFILING \
> >                                  -D__DISABLE_EXPORTS
> >
> >  GCOV_PROFILE                 := n
> > +# Sanitizer runtimes are unavailable and cannot be linked here.
> >  KASAN_SANITIZE                       := n
> > +KCSAN_SANITIZE                       := n
> >  UBSAN_SANITIZE                       := n
> >  OBJECT_FILES_NON_STANDARD    := y
>
> Not a big deal, but it might make sense to move the EFI stub exception
> to patch 3 since it isn't x86 specific (and will also apply for arm64).

Thanks for spotting, moved for v3.

-- Marco

> Otherwise this looks good to me.
>
> Thanks,
> Mark.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-17 14:12   ` Marco Elver
  (?)
  (?)
@ 2019-10-22 14:11   ` Mark Rutland
  2019-10-22 16:52       ` Marco Elver
  -1 siblings, 1 reply; 88+ messages in thread
From: Mark Rutland @ 2019-10-22 14:11 UTC (permalink / raw)
  To: Marco Elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, npiggin, paulmck, peterz, tglx, will, kasan-dev,
	linux-arch, linux-doc, linux-efi, linux-kbuild, linux-kernel,
	linux-mm, x86

Hi Marco,

On Thu, Oct 17, 2019 at 04:12:58PM +0200, Marco Elver wrote:
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
> 
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
> 
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].

This is generally looking good to me.

I have a few comments below. Those are mostly style and naming things to
minimize surprise, though I also have a couple of queries (nested vs
flat atomic regions and the number of watchpoints).

[...]

> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +	int disable; /* disable counter */

Can we call this disable_count? That would match the convention used for
preempt_count, and make it clear this isn't a boolean.

> +	int atomic_next; /* number of following atomic ops */

I'm a little unclear on why we need this given the begin ... end
helpers -- isn't knowing that we're in an atomic region sufficient?

> +
> +	/*
> +	 * We use separate variables to store if we are in a nestable or flat
> +	 * atomic region. This helps make sure that an atomic region with
> +	 * nesting support is not suddenly aborted when a flat region is
> +	 * contained within. Effectively this allows supporting nesting flat
> +	 * atomic regions within an outer nestable atomic region. Support for
> +	 * this is required as there are cases where a seqlock reader critical
> +	 * section (flat atomic region) is contained within a seqlock writer
> +	 * critical section (nestable atomic region), and the "mismatching
> +	 * kcsan_end_atomic()" warning would trigger otherwise.
> +	 */
> +	int atomic_region;
> +	bool atomic_region_flat;
> +};

I think we need to introduce nestability and flatness first. How about:

	/*
	 * Some atomic sequences are flat, and cannot contain another
	 * atomic sequence. Other atomic sequences are nestable, and may
	 * contain other flat and/or nestable sequences.
	 *
	 * For example, a seqlock writer critical section is nestable
	 * and may contain a seqlock reader critical section, which is
	 * flat.
	 *
	 * To support this we track the depth of nesting, and whether
	 * the leaf level is flat.
	 */
	int atomic_nest_count;
	bool in_flat_atomic;

That said, I'm not entirely clear on the distinction. Why would nesting
a reader within another reader not be legitimate?

> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +

Similarly to the check_{read,write}() naming, could we get rid of the
bool argument and split this into separate nestable and flat functions?

That makes it easier to read in-context, e.g.

	kcsan_nestable_atomic_begin();
	...
	kcsan_nestable_atomic_end();

... has a more obvious meaning than:

	kcsan_begin_atomic(true);
	...
	kcsan_end_atomic(true);

... and putting the begin/end at the end of the name makes it easier to
spot the matching pair.

[...]

> +static inline bool is_enabled(void)
> +{
> +	return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}

Can we please make this kcsan_is_enabled(), to avoid confusion with
IS_ENABLED()?

> +static inline unsigned int get_delay(void)
> +{
> +	unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +					     CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +	return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +		       ((prandom_u32() % max_delay) + 1) :
> +		       max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +	BUG_ON(!in_task());
> +
> +	kcsan_debugfs_init();
> +	kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +	/*
> +	 * We are in the init task, and no other tasks should be running.
> +	 */
> +	WRITE_ONCE(kcsan_enabled, true);
> +#endif

Where possible, please use IS_ENABLED() rather than ifdeffery for
portions of functions like this, e.g.

	/*
	 * We are in the init task, and no other tasks should be running.
	 */
	if (IS_ENABLED(CONFIG_KCSAN_EARLY_ENABLE))
		WRITE_ONCE(kcsan_enabled, true);

That makes code a bit easier to read, and ensures that the code always
gets build coverage, so it's less likely that code changes will
introduce a build failure when the option is enabled.

[...]

> +#ifdef CONFIG_KCSAN_DEBUG
> +	kcsan_disable_current();
> +	pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +	       is_write ? "write" : "read", size, ptr,
> +	       watchpoint_slot((unsigned long)ptr),
> +	       encode_watchpoint((unsigned long)ptr, size, is_write));
> +	kcsan_enable_current();
> +#endif

This can use IS_ENABLED(), e.g.

	if (IS_ENABLED(CONFIG_KCSAN_DEBUG)) {
		kcsan_disable_current();
		pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
		       is_write ? "write" : "read", size, ptr,
		       watchpoint_slot((unsigned long)ptr),
		       encode_watchpoint((unsigned long)ptr, size, is_write));
		kcsan_enable_current();
	}

[...]
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +		kcsan_report(ptr, size, is_write, smp_processor_id(),
> +			     kcsan_report_race_unknown_origin);
> +#endif

This can also use IS_ENABLED().

[...]

> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64

Is there any documentation as to how 64 was chosen? It's fine if it's
arbitrary, but it would be good to know either way.

I wonder if this is something that might need to scale with NR_CPUS (or
nr_cpus).

> +enum kcsan_counter_id {
> +	/*
> +	 * Number of watchpoints currently in use.
> +	 */
> +	kcsan_counter_used_watchpoints,

Nit: typically enum values are capitalized (as coding-style.rst says).
That helps to make it clear each value is a constant rather than a
variable. Likewise for the other enums here.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-17 14:12   ` Marco Elver
                     ` (2 preceding siblings ...)
  (?)
@ 2019-10-22 15:48   ` Oleg Nesterov
  2019-10-22 17:42       ` Marco Elver
  -1 siblings, 1 reply; 88+ messages in thread
From: Oleg Nesterov @ 2019-10-22 15:48 UTC (permalink / raw)
  To: Marco Elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, mark.rutland, npiggin, paulmck, peterz, tglx, will,
	kasan-dev, linux-arch, linux-doc, linux-efi, linux-kbuild,
	linux-kernel, linux-mm, x86

On 10/17, Marco Elver wrote:
>
> +	/*
> +	 * Delay this thread, to increase probability of observing a racy
> +	 * conflicting access.
> +	 */
> +	udelay(get_delay());
> +
> +	/*
> +	 * Re-read value, and check if it is as expected; if not, we infer a
> +	 * racy access.
> +	 */
> +	switch (size) {
> +	case 1:
> +		is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +		break;
> +	case 2:
> +		is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +		break;
> +	case 4:
> +		is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +		break;
> +	case 8:
> +		is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +		break;
> +	default:
> +		break; /* ignore; we do not diff the values */
> +	}
> +
> +	/* Check if this access raced with another. */
> +	if (!remove_watchpoint(watchpoint)) {
> +		/*
> +		 * No need to increment 'race' counter, as the racing thread
> +		 * already did.
> +		 */
> +		kcsan_report(ptr, size, is_write, smp_processor_id(),
> +			     kcsan_report_race_setup);
> +	} else if (!is_expected) {
> +		/* Inferring a race, since the value should not have changed. */
> +		kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +		kcsan_report(ptr, size, is_write, smp_processor_id(),
> +			     kcsan_report_race_unknown_origin);
> +#endif
> +	}

Not sure I understand this code...

Just for example. Suppose that task->state = TASK_UNINTERRUPTIBLE, this task
does __set_current_state(TASK_RUNNING), another CPU does wake_up_process(task)
which does the same UNINTERRUPTIBLE -> RUNNING transition.

Looks like, this is the "data race" according to kcsan?

Hmm. even the "if (!(p->state & state))" check in try_to_wake_up() can trigger
kcsan_report() ?

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-22 14:11   ` Mark Rutland
  2019-10-22 16:52       ` Marco Elver
@ 2019-10-22 16:52       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-22 16:52 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi, Linux Kbuild mailing list,
	LKML, Linux Memory Management List, the arch/x86 maintainers

Hi Mark,

Thanks for you comments; see inline comments below.

On Tue, 22 Oct 2019 at 16:11, Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi Marco,
>
> On Thu, Oct 17, 2019 at 04:12:58PM +0200, Marco Elver wrote:
> > Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> > kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> > See the included Documentation/dev-tools/kcsan.rst for more details.
> >
> > This patch adds basic infrastructure, but does not yet enable KCSAN for
> > any architecture.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v2:
> > * Elaborate comment about instrumentation calls emitted by compilers.
> > * Replace kcsan_check_access(.., {true, false}) with
> >   kcsan_check_{read,write} for improved readability.
> > * Change bug title of race of unknown origin to just say "data-race in".
> > * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> > * Add comment about safety of find_watchpoint without user_access_save.
> > * Remove unnecessary preempt_disable/enable and elaborate on comment why
> >   we want to disable interrupts and preemptions.
> > * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
> >   contexts [Suggested by Mark Rutland].
>
> This is generally looking good to me.
>
> I have a few comments below. Those are mostly style and naming things to
> minimize surprise, though I also have a couple of queries (nested vs
> flat atomic regions and the number of watchpoints).
>
> [...]
>
> > diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> > new file mode 100644
> > index 000000000000..fd5de2ba3a16
> > --- /dev/null
> > +++ b/include/linux/kcsan.h
> > @@ -0,0 +1,108 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _LINUX_KCSAN_H
> > +#define _LINUX_KCSAN_H
> > +
> > +#include <linux/types.h>
> > +#include <linux/kcsan-checks.h>
> > +
> > +#ifdef CONFIG_KCSAN
> > +
> > +/*
> > + * Context for each thread of execution: for tasks, this is stored in
> > + * task_struct, and interrupts access internal per-CPU storage.
> > + */
> > +struct kcsan_ctx {
> > +     int disable; /* disable counter */
>
> Can we call this disable_count? That would match the convention used for
> preempt_count, and make it clear this isn't a boolean.

Done for v3.

> > +     int atomic_next; /* number of following atomic ops */
>
> I'm a little unclear on why we need this given the begin ... end
> helpers -- isn't knowing that we're in an atomic region sufficient?

Sadly no, this is all due to seqlock usage. See seqlock patch for explanation.

> > +
> > +     /*
> > +      * We use separate variables to store if we are in a nestable or flat
> > +      * atomic region. This helps make sure that an atomic region with
> > +      * nesting support is not suddenly aborted when a flat region is
> > +      * contained within. Effectively this allows supporting nesting flat
> > +      * atomic regions within an outer nestable atomic region. Support for
> > +      * this is required as there are cases where a seqlock reader critical
> > +      * section (flat atomic region) is contained within a seqlock writer
> > +      * critical section (nestable atomic region), and the "mismatching
> > +      * kcsan_end_atomic()" warning would trigger otherwise.
> > +      */
> > +     int atomic_region;
> > +     bool atomic_region_flat;
> > +};
>
> I think we need to introduce nestability and flatness first. How about:

Thanks, updated wording to read better hopefully.

>         /*
>          * Some atomic sequences are flat, and cannot contain another
>          * atomic sequence. Other atomic sequences are nestable, and may
>          * contain other flat and/or nestable sequences.
>          *
>          * For example, a seqlock writer critical section is nestable
>          * and may contain a seqlock reader critical section, which is
>          * flat.
>          *
>          * To support this we track the depth of nesting, and whether
>          * the leaf level is flat.
>          */
>         int atomic_nest_count;
>         bool in_flat_atomic;
>
> That said, I'm not entirely clear on the distinction. Why would nesting
> a reader within another reader not be legitimate?

It is legitimate, however, seqlock reader critical sections do not
always have a balance begin/end. I ran into trouble initially when
readers were still nestable, as e.g. read_seqcount_retry can be called
multiple times. See seqlock patch for more explanations.

> > +
> > +/**
> > + * kcsan_init - initialize KCSAN runtime
> > + */
> > +void kcsan_init(void);
> > +
> > +/**
> > + * kcsan_disable_current - disable KCSAN for the current context
> > + *
> > + * Supports nesting.
> > + */
> > +void kcsan_disable_current(void);
> > +
> > +/**
> > + * kcsan_enable_current - re-enable KCSAN for the current context
> > + *
> > + * Supports nesting.
> > + */
> > +void kcsan_enable_current(void);
> > +
> > +/**
> > + * kcsan_begin_atomic - use to denote an atomic region
> > + *
> > + * Accesses within the atomic region may appear to race with other accesses but
> > + * should be considered atomic.
> > + *
> > + * @nest true if regions may be nested, or false for flat region
> > + */
> > +void kcsan_begin_atomic(bool nest);
> > +
> > +/**
> > + * kcsan_end_atomic - end atomic region
> > + *
> > + * @nest must match argument to kcsan_begin_atomic().
> > + */
> > +void kcsan_end_atomic(bool nest);
> > +
>
> Similarly to the check_{read,write}() naming, could we get rid of the
> bool argument and split this into separate nestable and flat functions?
>
> That makes it easier to read in-context, e.g.
>
>         kcsan_nestable_atomic_begin();
>         ...
>         kcsan_nestable_atomic_end();
>
> ... has a more obvious meaning than:
>
>         kcsan_begin_atomic(true);
>         ...
>         kcsan_end_atomic(true);
>
> ... and putting the begin/end at the end of the name makes it easier to
> spot the matching pair.

Thanks, done for v3.

> [...]
>
> > +static inline bool is_enabled(void)
> > +{
> > +     return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> > +}
>
> Can we please make this kcsan_is_enabled(), to avoid confusion with
> IS_ENABLED()?

Done for v3.

> > +static inline unsigned int get_delay(void)
> > +{
> > +     unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> > +                                          CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> > +     return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> > +                    ((prandom_u32() % max_delay) + 1) :
> > +                    max_delay;
> > +}
> > +
> > +/* === Public interface ===================================================== */
> > +
> > +void __init kcsan_init(void)
> > +{
> > +     BUG_ON(!in_task());
> > +
> > +     kcsan_debugfs_init();
> > +     kcsan_enable_current();
> > +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> > +     /*
> > +      * We are in the init task, and no other tasks should be running.
> > +      */
> > +     WRITE_ONCE(kcsan_enabled, true);
> > +#endif
>
> Where possible, please use IS_ENABLED() rather than ifdeffery for
> portions of functions like this, e.g.
>
>         /*
>          * We are in the init task, and no other tasks should be running.
>          */
>         if (IS_ENABLED(CONFIG_KCSAN_EARLY_ENABLE))
>                 WRITE_ONCE(kcsan_enabled, true);
>
> That makes code a bit easier to read, and ensures that the code always
> gets build coverage, so it's less likely that code changes will
> introduce a build failure when the option is enabled.

Thanks, done for v3.

> [...]
>
> > +#ifdef CONFIG_KCSAN_DEBUG
> > +     kcsan_disable_current();
> > +     pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> > +            is_write ? "write" : "read", size, ptr,
> > +            watchpoint_slot((unsigned long)ptr),
> > +            encode_watchpoint((unsigned long)ptr, size, is_write));
> > +     kcsan_enable_current();
> > +#endif
>
> This can use IS_ENABLED(), e.g.
>
>         if (IS_ENABLED(CONFIG_KCSAN_DEBUG)) {
>                 kcsan_disable_current();
>                 pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
>                        is_write ? "write" : "read", size, ptr,
>                        watchpoint_slot((unsigned long)ptr),
>                        encode_watchpoint((unsigned long)ptr, size, is_write));
>                 kcsan_enable_current();
>         }
>
> [...]
> > +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> > +             kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                          kcsan_report_race_unknown_origin);
> > +#endif
>
> This can also use IS_ENABLED().

Done for v3.

> [...]
>
> > diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> > new file mode 100644
> > index 000000000000..429479b3041d
> > --- /dev/null
> > +++ b/kernel/kcsan/kcsan.h
> > @@ -0,0 +1,140 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _MM_KCSAN_KCSAN_H
> > +#define _MM_KCSAN_KCSAN_H
> > +
> > +#include <linux/kcsan.h>
> > +
> > +/*
> > + * Total number of watchpoints. An address range maps into a specific slot as
> > + * specified in `encoding.h`. Although larger number of watchpoints may not even
> > + * be usable due to limited thread count, a larger value will improve
> > + * performance due to reducing cache-line contention.
> > + */
> > +#define KCSAN_NUM_WATCHPOINTS 64
>
> Is there any documentation as to how 64 was chosen? It's fine if it's
> arbitrary, but it would be good to know either way.

It was arbitrary in the sense that I chose the largest value that I
think is an acceptable overhead in terms of storage, i.e. on 64-bit
watchpoints consume 512 bytes. It should always be large enough so
that "no_capacity" counter does not increase frequently.

> I wonder if this is something that might need to scale with NR_CPUS (or
> nr_cpus).

I think this is hard to say. I've decided to make it configurable in
v3, with a BUILD_BUG_ON to ensure its value is within expected bounds.

> > +enum kcsan_counter_id {
> > +     /*
> > +      * Number of watchpoints currently in use.
> > +      */
> > +     kcsan_counter_used_watchpoints,
>
> Nit: typically enum values are capitalized (as coding-style.rst says).
> That helps to make it clear each value is a constant rather than a
> variable. Likewise for the other enums here.

Done for v3.

Thanks,
-- Marco

> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-22 16:52       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-22 16:52 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet

Hi Mark,

Thanks for you comments; see inline comments below.

On Tue, 22 Oct 2019 at 16:11, Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi Marco,
>
> On Thu, Oct 17, 2019 at 04:12:58PM +0200, Marco Elver wrote:
> > Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> > kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> > See the included Documentation/dev-tools/kcsan.rst for more details.
> >
> > This patch adds basic infrastructure, but does not yet enable KCSAN for
> > any architecture.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v2:
> > * Elaborate comment about instrumentation calls emitted by compilers.
> > * Replace kcsan_check_access(.., {true, false}) with
> >   kcsan_check_{read,write} for improved readability.
> > * Change bug title of race of unknown origin to just say "data-race in".
> > * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> > * Add comment about safety of find_watchpoint without user_access_save.
> > * Remove unnecessary preempt_disable/enable and elaborate on comment why
> >   we want to disable interrupts and preemptions.
> > * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
> >   contexts [Suggested by Mark Rutland].
>
> This is generally looking good to me.
>
> I have a few comments below. Those are mostly style and naming things to
> minimize surprise, though I also have a couple of queries (nested vs
> flat atomic regions and the number of watchpoints).
>
> [...]
>
> > diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> > new file mode 100644
> > index 000000000000..fd5de2ba3a16
> > --- /dev/null
> > +++ b/include/linux/kcsan.h
> > @@ -0,0 +1,108 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _LINUX_KCSAN_H
> > +#define _LINUX_KCSAN_H
> > +
> > +#include <linux/types.h>
> > +#include <linux/kcsan-checks.h>
> > +
> > +#ifdef CONFIG_KCSAN
> > +
> > +/*
> > + * Context for each thread of execution: for tasks, this is stored in
> > + * task_struct, and interrupts access internal per-CPU storage.
> > + */
> > +struct kcsan_ctx {
> > +     int disable; /* disable counter */
>
> Can we call this disable_count? That would match the convention used for
> preempt_count, and make it clear this isn't a boolean.

Done for v3.

> > +     int atomic_next; /* number of following atomic ops */
>
> I'm a little unclear on why we need this given the begin ... end
> helpers -- isn't knowing that we're in an atomic region sufficient?

Sadly no, this is all due to seqlock usage. See seqlock patch for explanation.

> > +
> > +     /*
> > +      * We use separate variables to store if we are in a nestable or flat
> > +      * atomic region. This helps make sure that an atomic region with
> > +      * nesting support is not suddenly aborted when a flat region is
> > +      * contained within. Effectively this allows supporting nesting flat
> > +      * atomic regions within an outer nestable atomic region. Support for
> > +      * this is required as there are cases where a seqlock reader critical
> > +      * section (flat atomic region) is contained within a seqlock writer
> > +      * critical section (nestable atomic region), and the "mismatching
> > +      * kcsan_end_atomic()" warning would trigger otherwise.
> > +      */
> > +     int atomic_region;
> > +     bool atomic_region_flat;
> > +};
>
> I think we need to introduce nestability and flatness first. How about:

Thanks, updated wording to read better hopefully.

>         /*
>          * Some atomic sequences are flat, and cannot contain another
>          * atomic sequence. Other atomic sequences are nestable, and may
>          * contain other flat and/or nestable sequences.
>          *
>          * For example, a seqlock writer critical section is nestable
>          * and may contain a seqlock reader critical section, which is
>          * flat.
>          *
>          * To support this we track the depth of nesting, and whether
>          * the leaf level is flat.
>          */
>         int atomic_nest_count;
>         bool in_flat_atomic;
>
> That said, I'm not entirely clear on the distinction. Why would nesting
> a reader within another reader not be legitimate?

It is legitimate, however, seqlock reader critical sections do not
always have a balance begin/end. I ran into trouble initially when
readers were still nestable, as e.g. read_seqcount_retry can be called
multiple times. See seqlock patch for more explanations.

> > +
> > +/**
> > + * kcsan_init - initialize KCSAN runtime
> > + */
> > +void kcsan_init(void);
> > +
> > +/**
> > + * kcsan_disable_current - disable KCSAN for the current context
> > + *
> > + * Supports nesting.
> > + */
> > +void kcsan_disable_current(void);
> > +
> > +/**
> > + * kcsan_enable_current - re-enable KCSAN for the current context
> > + *
> > + * Supports nesting.
> > + */
> > +void kcsan_enable_current(void);
> > +
> > +/**
> > + * kcsan_begin_atomic - use to denote an atomic region
> > + *
> > + * Accesses within the atomic region may appear to race with other accesses but
> > + * should be considered atomic.
> > + *
> > + * @nest true if regions may be nested, or false for flat region
> > + */
> > +void kcsan_begin_atomic(bool nest);
> > +
> > +/**
> > + * kcsan_end_atomic - end atomic region
> > + *
> > + * @nest must match argument to kcsan_begin_atomic().
> > + */
> > +void kcsan_end_atomic(bool nest);
> > +
>
> Similarly to the check_{read,write}() naming, could we get rid of the
> bool argument and split this into separate nestable and flat functions?
>
> That makes it easier to read in-context, e.g.
>
>         kcsan_nestable_atomic_begin();
>         ...
>         kcsan_nestable_atomic_end();
>
> ... has a more obvious meaning than:
>
>         kcsan_begin_atomic(true);
>         ...
>         kcsan_end_atomic(true);
>
> ... and putting the begin/end at the end of the name makes it easier to
> spot the matching pair.

Thanks, done for v3.

> [...]
>
> > +static inline bool is_enabled(void)
> > +{
> > +     return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> > +}
>
> Can we please make this kcsan_is_enabled(), to avoid confusion with
> IS_ENABLED()?

Done for v3.

> > +static inline unsigned int get_delay(void)
> > +{
> > +     unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> > +                                          CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> > +     return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> > +                    ((prandom_u32() % max_delay) + 1) :
> > +                    max_delay;
> > +}
> > +
> > +/* === Public interface ===================================================== */
> > +
> > +void __init kcsan_init(void)
> > +{
> > +     BUG_ON(!in_task());
> > +
> > +     kcsan_debugfs_init();
> > +     kcsan_enable_current();
> > +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> > +     /*
> > +      * We are in the init task, and no other tasks should be running.
> > +      */
> > +     WRITE_ONCE(kcsan_enabled, true);
> > +#endif
>
> Where possible, please use IS_ENABLED() rather than ifdeffery for
> portions of functions like this, e.g.
>
>         /*
>          * We are in the init task, and no other tasks should be running.
>          */
>         if (IS_ENABLED(CONFIG_KCSAN_EARLY_ENABLE))
>                 WRITE_ONCE(kcsan_enabled, true);
>
> That makes code a bit easier to read, and ensures that the code always
> gets build coverage, so it's less likely that code changes will
> introduce a build failure when the option is enabled.

Thanks, done for v3.

> [...]
>
> > +#ifdef CONFIG_KCSAN_DEBUG
> > +     kcsan_disable_current();
> > +     pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> > +            is_write ? "write" : "read", size, ptr,
> > +            watchpoint_slot((unsigned long)ptr),
> > +            encode_watchpoint((unsigned long)ptr, size, is_write));
> > +     kcsan_enable_current();
> > +#endif
>
> This can use IS_ENABLED(), e.g.
>
>         if (IS_ENABLED(CONFIG_KCSAN_DEBUG)) {
>                 kcsan_disable_current();
>                 pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
>                        is_write ? "write" : "read", size, ptr,
>                        watchpoint_slot((unsigned long)ptr),
>                        encode_watchpoint((unsigned long)ptr, size, is_write));
>                 kcsan_enable_current();
>         }
>
> [...]
> > +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> > +             kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                          kcsan_report_race_unknown_origin);
> > +#endif
>
> This can also use IS_ENABLED().

Done for v3.

> [...]
>
> > diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> > new file mode 100644
> > index 000000000000..429479b3041d
> > --- /dev/null
> > +++ b/kernel/kcsan/kcsan.h
> > @@ -0,0 +1,140 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _MM_KCSAN_KCSAN_H
> > +#define _MM_KCSAN_KCSAN_H
> > +
> > +#include <linux/kcsan.h>
> > +
> > +/*
> > + * Total number of watchpoints. An address range maps into a specific slot as
> > + * specified in `encoding.h`. Although larger number of watchpoints may not even
> > + * be usable due to limited thread count, a larger value will improve
> > + * performance due to reducing cache-line contention.
> > + */
> > +#define KCSAN_NUM_WATCHPOINTS 64
>
> Is there any documentation as to how 64 was chosen? It's fine if it's
> arbitrary, but it would be good to know either way.

It was arbitrary in the sense that I chose the largest value that I
think is an acceptable overhead in terms of storage, i.e. on 64-bit
watchpoints consume 512 bytes. It should always be large enough so
that "no_capacity" counter does not increase frequently.

> I wonder if this is something that might need to scale with NR_CPUS (or
> nr_cpus).

I think this is hard to say. I've decided to make it configurable in
v3, with a BUILD_BUG_ON to ensure its value is within expected bounds.

> > +enum kcsan_counter_id {
> > +     /*
> > +      * Number of watchpoints currently in use.
> > +      */
> > +     kcsan_counter_used_watchpoints,
>
> Nit: typically enum values are capitalized (as coding-style.rst says).
> That helps to make it clear each value is a constant rather than a
> variable. Likewise for the other enums here.

Done for v3.

Thanks,
-- Marco

> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-22 16:52       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-22 16:52 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi, Linux Kbuild mailing list,
	LKML, Linux Memory Management List, the arch/x86 maintainers

Hi Mark,

Thanks for you comments; see inline comments below.

On Tue, 22 Oct 2019 at 16:11, Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi Marco,
>
> On Thu, Oct 17, 2019 at 04:12:58PM +0200, Marco Elver wrote:
> > Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> > kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> > See the included Documentation/dev-tools/kcsan.rst for more details.
> >
> > This patch adds basic infrastructure, but does not yet enable KCSAN for
> > any architecture.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v2:
> > * Elaborate comment about instrumentation calls emitted by compilers.
> > * Replace kcsan_check_access(.., {true, false}) with
> >   kcsan_check_{read,write} for improved readability.
> > * Change bug title of race of unknown origin to just say "data-race in".
> > * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> > * Add comment about safety of find_watchpoint without user_access_save.
> > * Remove unnecessary preempt_disable/enable and elaborate on comment why
> >   we want to disable interrupts and preemptions.
> > * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
> >   contexts [Suggested by Mark Rutland].
>
> This is generally looking good to me.
>
> I have a few comments below. Those are mostly style and naming things to
> minimize surprise, though I also have a couple of queries (nested vs
> flat atomic regions and the number of watchpoints).
>
> [...]
>
> > diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> > new file mode 100644
> > index 000000000000..fd5de2ba3a16
> > --- /dev/null
> > +++ b/include/linux/kcsan.h
> > @@ -0,0 +1,108 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _LINUX_KCSAN_H
> > +#define _LINUX_KCSAN_H
> > +
> > +#include <linux/types.h>
> > +#include <linux/kcsan-checks.h>
> > +
> > +#ifdef CONFIG_KCSAN
> > +
> > +/*
> > + * Context for each thread of execution: for tasks, this is stored in
> > + * task_struct, and interrupts access internal per-CPU storage.
> > + */
> > +struct kcsan_ctx {
> > +     int disable; /* disable counter */
>
> Can we call this disable_count? That would match the convention used for
> preempt_count, and make it clear this isn't a boolean.

Done for v3.

> > +     int atomic_next; /* number of following atomic ops */
>
> I'm a little unclear on why we need this given the begin ... end
> helpers -- isn't knowing that we're in an atomic region sufficient?

Sadly no, this is all due to seqlock usage. See seqlock patch for explanation.

> > +
> > +     /*
> > +      * We use separate variables to store if we are in a nestable or flat
> > +      * atomic region. This helps make sure that an atomic region with
> > +      * nesting support is not suddenly aborted when a flat region is
> > +      * contained within. Effectively this allows supporting nesting flat
> > +      * atomic regions within an outer nestable atomic region. Support for
> > +      * this is required as there are cases where a seqlock reader critical
> > +      * section (flat atomic region) is contained within a seqlock writer
> > +      * critical section (nestable atomic region), and the "mismatching
> > +      * kcsan_end_atomic()" warning would trigger otherwise.
> > +      */
> > +     int atomic_region;
> > +     bool atomic_region_flat;
> > +};
>
> I think we need to introduce nestability and flatness first. How about:

Thanks, updated wording to read better hopefully.

>         /*
>          * Some atomic sequences are flat, and cannot contain another
>          * atomic sequence. Other atomic sequences are nestable, and may
>          * contain other flat and/or nestable sequences.
>          *
>          * For example, a seqlock writer critical section is nestable
>          * and may contain a seqlock reader critical section, which is
>          * flat.
>          *
>          * To support this we track the depth of nesting, and whether
>          * the leaf level is flat.
>          */
>         int atomic_nest_count;
>         bool in_flat_atomic;
>
> That said, I'm not entirely clear on the distinction. Why would nesting
> a reader within another reader not be legitimate?

It is legitimate, however, seqlock reader critical sections do not
always have a balance begin/end. I ran into trouble initially when
readers were still nestable, as e.g. read_seqcount_retry can be called
multiple times. See seqlock patch for more explanations.

> > +
> > +/**
> > + * kcsan_init - initialize KCSAN runtime
> > + */
> > +void kcsan_init(void);
> > +
> > +/**
> > + * kcsan_disable_current - disable KCSAN for the current context
> > + *
> > + * Supports nesting.
> > + */
> > +void kcsan_disable_current(void);
> > +
> > +/**
> > + * kcsan_enable_current - re-enable KCSAN for the current context
> > + *
> > + * Supports nesting.
> > + */
> > +void kcsan_enable_current(void);
> > +
> > +/**
> > + * kcsan_begin_atomic - use to denote an atomic region
> > + *
> > + * Accesses within the atomic region may appear to race with other accesses but
> > + * should be considered atomic.
> > + *
> > + * @nest true if regions may be nested, or false for flat region
> > + */
> > +void kcsan_begin_atomic(bool nest);
> > +
> > +/**
> > + * kcsan_end_atomic - end atomic region
> > + *
> > + * @nest must match argument to kcsan_begin_atomic().
> > + */
> > +void kcsan_end_atomic(bool nest);
> > +
>
> Similarly to the check_{read,write}() naming, could we get rid of the
> bool argument and split this into separate nestable and flat functions?
>
> That makes it easier to read in-context, e.g.
>
>         kcsan_nestable_atomic_begin();
>         ...
>         kcsan_nestable_atomic_end();
>
> ... has a more obvious meaning than:
>
>         kcsan_begin_atomic(true);
>         ...
>         kcsan_end_atomic(true);
>
> ... and putting the begin/end at the end of the name makes it easier to
> spot the matching pair.

Thanks, done for v3.

> [...]
>
> > +static inline bool is_enabled(void)
> > +{
> > +     return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> > +}
>
> Can we please make this kcsan_is_enabled(), to avoid confusion with
> IS_ENABLED()?

Done for v3.

> > +static inline unsigned int get_delay(void)
> > +{
> > +     unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> > +                                          CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> > +     return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> > +                    ((prandom_u32() % max_delay) + 1) :
> > +                    max_delay;
> > +}
> > +
> > +/* === Public interface ===================================================== */
> > +
> > +void __init kcsan_init(void)
> > +{
> > +     BUG_ON(!in_task());
> > +
> > +     kcsan_debugfs_init();
> > +     kcsan_enable_current();
> > +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> > +     /*
> > +      * We are in the init task, and no other tasks should be running.
> > +      */
> > +     WRITE_ONCE(kcsan_enabled, true);
> > +#endif
>
> Where possible, please use IS_ENABLED() rather than ifdeffery for
> portions of functions like this, e.g.
>
>         /*
>          * We are in the init task, and no other tasks should be running.
>          */
>         if (IS_ENABLED(CONFIG_KCSAN_EARLY_ENABLE))
>                 WRITE_ONCE(kcsan_enabled, true);
>
> That makes code a bit easier to read, and ensures that the code always
> gets build coverage, so it's less likely that code changes will
> introduce a build failure when the option is enabled.

Thanks, done for v3.

> [...]
>
> > +#ifdef CONFIG_KCSAN_DEBUG
> > +     kcsan_disable_current();
> > +     pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> > +            is_write ? "write" : "read", size, ptr,
> > +            watchpoint_slot((unsigned long)ptr),
> > +            encode_watchpoint((unsigned long)ptr, size, is_write));
> > +     kcsan_enable_current();
> > +#endif
>
> This can use IS_ENABLED(), e.g.
>
>         if (IS_ENABLED(CONFIG_KCSAN_DEBUG)) {
>                 kcsan_disable_current();
>                 pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
>                        is_write ? "write" : "read", size, ptr,
>                        watchpoint_slot((unsigned long)ptr),
>                        encode_watchpoint((unsigned long)ptr, size, is_write));
>                 kcsan_enable_current();
>         }
>
> [...]
> > +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> > +             kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                          kcsan_report_race_unknown_origin);
> > +#endif
>
> This can also use IS_ENABLED().

Done for v3.

> [...]
>
> > diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> > new file mode 100644
> > index 000000000000..429479b3041d
> > --- /dev/null
> > +++ b/kernel/kcsan/kcsan.h
> > @@ -0,0 +1,140 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _MM_KCSAN_KCSAN_H
> > +#define _MM_KCSAN_KCSAN_H
> > +
> > +#include <linux/kcsan.h>
> > +
> > +/*
> > + * Total number of watchpoints. An address range maps into a specific slot as
> > + * specified in `encoding.h`. Although larger number of watchpoints may not even
> > + * be usable due to limited thread count, a larger value will improve
> > + * performance due to reducing cache-line contention.
> > + */
> > +#define KCSAN_NUM_WATCHPOINTS 64
>
> Is there any documentation as to how 64 was chosen? It's fine if it's
> arbitrary, but it would be good to know either way.

It was arbitrary in the sense that I chose the largest value that I
think is an acceptable overhead in terms of storage, i.e. on 64-bit
watchpoints consume 512 bytes. It should always be large enough so
that "no_capacity" counter does not increase frequently.

> I wonder if this is something that might need to scale with NR_CPUS (or
> nr_cpus).

I think this is hard to say. I've decided to make it configurable in
v3, with a BUILD_BUG_ON to ensure its value is within expected bounds.

> > +enum kcsan_counter_id {
> > +     /*
> > +      * Number of watchpoints currently in use.
> > +      */
> > +     kcsan_counter_used_watchpoints,
>
> Nit: typically enum values are capitalized (as coding-style.rst says).
> That helps to make it clear each value is a constant rather than a
> variable. Likewise for the other enums here.

Done for v3.

Thanks,
-- Marco

> Thanks,
> Mark.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-22 15:48   ` Oleg Nesterov
  2019-10-22 17:42       ` Marco Elver
@ 2019-10-22 17:42       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-22 17:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Mark Rutland, Nicholas Piggin, Paul E. McKenney,
	Peter Zijlstra, Thomas Gleixner, Will Deacon, kasan-dev,
	linux-arch, open list:DOCUMENTATION, linux-efi,
	Linux Kbuild mailing list, LKML, Linux Memory Management List,
	the arch/x86 maintainers

On Tue, 22 Oct 2019 at 17:49, Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 10/17, Marco Elver wrote:
> >
> > +     /*
> > +      * Delay this thread, to increase probability of observing a racy
> > +      * conflicting access.
> > +      */
> > +     udelay(get_delay());
> > +
> > +     /*
> > +      * Re-read value, and check if it is as expected; if not, we infer a
> > +      * racy access.
> > +      */
> > +     switch (size) {
> > +     case 1:
> > +             is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> > +             break;
> > +     case 2:
> > +             is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> > +             break;
> > +     case 4:
> > +             is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> > +             break;
> > +     case 8:
> > +             is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> > +             break;
> > +     default:
> > +             break; /* ignore; we do not diff the values */
> > +     }
> > +
> > +     /* Check if this access raced with another. */
> > +     if (!remove_watchpoint(watchpoint)) {
> > +             /*
> > +              * No need to increment 'race' counter, as the racing thread
> > +              * already did.
> > +              */
> > +             kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                          kcsan_report_race_setup);
> > +     } else if (!is_expected) {
> > +             /* Inferring a race, since the value should not have changed. */
> > +             kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> > +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> > +             kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                          kcsan_report_race_unknown_origin);
> > +#endif
> > +     }
>
> Not sure I understand this code...
>
> Just for example. Suppose that task->state = TASK_UNINTERRUPTIBLE, this task
> does __set_current_state(TASK_RUNNING), another CPU does wake_up_process(task)
> which does the same UNINTERRUPTIBLE -> RUNNING transition.
>
> Looks like, this is the "data race" according to kcsan?

Yes, they are "data races". They are probably not "race conditions" though.

This is a fair distinction to make, and we never claimed to find "race
conditions" only -- race conditions are logic bugs that result in bad
state due to unexpected interleaving of threads. Data races are more
subtle, and become relevant at the programming language level.

In Documentation we summarize: "Informally, two operations conflict if
they access the same memory location, and at least one of them is a
write operation. In an execution, two memory operations from different
threads form a data-race if they conflict, at least one of them is a
*plain* access (non-atomic), and they are unordered in the
"happens-before" order according to the LKMM."

KCSAN's goal is to find *data races* according to the LKMM.  Some data
races are race conditions (usually the more interesting bugs) -- but
not *all* data races are race conditions. Those are what are usually
referred to as "benign", but they can still become bugs on the wrong
arch/compiler combination. Hence, the need to annotate these accesses
with READ_ONCE, WRITE_ONCE or use atomic_t:
- https://lwn.net/Articles/793253/
- https://lwn.net/Articles/799218/

> Hmm. even the "if (!(p->state & state))" check in try_to_wake_up() can trigger
> kcsan_report() ?

We blacklisted sched (KCSAN_SANITIZE := n   in kernel/sched/Makefile),
so these data races won't actually be reported.

Thanks,
-- Marco

> Oleg.
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-22 17:42       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-22 17:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet

On Tue, 22 Oct 2019 at 17:49, Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 10/17, Marco Elver wrote:
> >
> > +     /*
> > +      * Delay this thread, to increase probability of observing a racy
> > +      * conflicting access.
> > +      */
> > +     udelay(get_delay());
> > +
> > +     /*
> > +      * Re-read value, and check if it is as expected; if not, we infer a
> > +      * racy access.
> > +      */
> > +     switch (size) {
> > +     case 1:
> > +             is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> > +             break;
> > +     case 2:
> > +             is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> > +             break;
> > +     case 4:
> > +             is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> > +             break;
> > +     case 8:
> > +             is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> > +             break;
> > +     default:
> > +             break; /* ignore; we do not diff the values */
> > +     }
> > +
> > +     /* Check if this access raced with another. */
> > +     if (!remove_watchpoint(watchpoint)) {
> > +             /*
> > +              * No need to increment 'race' counter, as the racing thread
> > +              * already did.
> > +              */
> > +             kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                          kcsan_report_race_setup);
> > +     } else if (!is_expected) {
> > +             /* Inferring a race, since the value should not have changed. */
> > +             kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> > +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> > +             kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                          kcsan_report_race_unknown_origin);
> > +#endif
> > +     }
>
> Not sure I understand this code...
>
> Just for example. Suppose that task->state = TASK_UNINTERRUPTIBLE, this task
> does __set_current_state(TASK_RUNNING), another CPU does wake_up_process(task)
> which does the same UNINTERRUPTIBLE -> RUNNING transition.
>
> Looks like, this is the "data race" according to kcsan?

Yes, they are "data races". They are probably not "race conditions" though.

This is a fair distinction to make, and we never claimed to find "race
conditions" only -- race conditions are logic bugs that result in bad
state due to unexpected interleaving of threads. Data races are more
subtle, and become relevant at the programming language level.

In Documentation we summarize: "Informally, two operations conflict if
they access the same memory location, and at least one of them is a
write operation. In an execution, two memory operations from different
threads form a data-race if they conflict, at least one of them is a
*plain* access (non-atomic), and they are unordered in the
"happens-before" order according to the LKMM."

KCSAN's goal is to find *data races* according to the LKMM.  Some data
races are race conditions (usually the more interesting bugs) -- but
not *all* data races are race conditions. Those are what are usually
referred to as "benign", but they can still become bugs on the wrong
arch/compiler combination. Hence, the need to annotate these accesses
with READ_ONCE, WRITE_ONCE or use atomic_t:
- https://lwn.net/Articles/793253/
- https://lwn.net/Articles/799218/

> Hmm. even the "if (!(p->state & state))" check in try_to_wake_up() can trigger
> kcsan_report() ?

We blacklisted sched (KCSAN_SANITIZE := n   in kernel/sched/Makefile),
so these data races won't actually be reported.

Thanks,
-- Marco

> Oleg.
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-22 17:42       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-22 17:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Mark Rutland, Nicholas Piggin, Paul E. McKenney,
	Peter Zijlstra, Thomas Gleixner, Will Deacon, kasan-dev,
	linux-arch, open list:DOCUMENTATION, linux-efi,
	Linux Kbuild mailing list, LKML, Linux Memory Management List,
	the arch/x86 maintainers

On Tue, 22 Oct 2019 at 17:49, Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 10/17, Marco Elver wrote:
> >
> > +     /*
> > +      * Delay this thread, to increase probability of observing a racy
> > +      * conflicting access.
> > +      */
> > +     udelay(get_delay());
> > +
> > +     /*
> > +      * Re-read value, and check if it is as expected; if not, we infer a
> > +      * racy access.
> > +      */
> > +     switch (size) {
> > +     case 1:
> > +             is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> > +             break;
> > +     case 2:
> > +             is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> > +             break;
> > +     case 4:
> > +             is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> > +             break;
> > +     case 8:
> > +             is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> > +             break;
> > +     default:
> > +             break; /* ignore; we do not diff the values */
> > +     }
> > +
> > +     /* Check if this access raced with another. */
> > +     if (!remove_watchpoint(watchpoint)) {
> > +             /*
> > +              * No need to increment 'race' counter, as the racing thread
> > +              * already did.
> > +              */
> > +             kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                          kcsan_report_race_setup);
> > +     } else if (!is_expected) {
> > +             /* Inferring a race, since the value should not have changed. */
> > +             kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> > +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> > +             kcsan_report(ptr, size, is_write, smp_processor_id(),
> > +                          kcsan_report_race_unknown_origin);
> > +#endif
> > +     }
>
> Not sure I understand this code...
>
> Just for example. Suppose that task->state = TASK_UNINTERRUPTIBLE, this task
> does __set_current_state(TASK_RUNNING), another CPU does wake_up_process(task)
> which does the same UNINTERRUPTIBLE -> RUNNING transition.
>
> Looks like, this is the "data race" according to kcsan?

Yes, they are "data races". They are probably not "race conditions" though.

This is a fair distinction to make, and we never claimed to find "race
conditions" only -- race conditions are logic bugs that result in bad
state due to unexpected interleaving of threads. Data races are more
subtle, and become relevant at the programming language level.

In Documentation we summarize: "Informally, two operations conflict if
they access the same memory location, and at least one of them is a
write operation. In an execution, two memory operations from different
threads form a data-race if they conflict, at least one of them is a
*plain* access (non-atomic), and they are unordered in the
"happens-before" order according to the LKMM."

KCSAN's goal is to find *data races* according to the LKMM.  Some data
races are race conditions (usually the more interesting bugs) -- but
not *all* data races are race conditions. Those are what are usually
referred to as "benign", but they can still become bugs on the wrong
arch/compiler combination. Hence, the need to annotate these accesses
with READ_ONCE, WRITE_ONCE or use atomic_t:
- https://lwn.net/Articles/793253/
- https://lwn.net/Articles/799218/

> Hmm. even the "if (!(p->state & state))" check in try_to_wake_up() can trigger
> kcsan_report() ?

We blacklisted sched (KCSAN_SANITIZE := n   in kernel/sched/Makefile),
so these data races won't actually be reported.

Thanks,
-- Marco

> Oleg.
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 7/8] locking/atomics, kcsan: Add KCSAN instrumentation
  2019-10-22 12:33   ` Mark Rutland
  2019-10-22 18:17       ` Marco Elver
@ 2019-10-22 18:17       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-22 18:17 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi, Linux Kbuild mailing list,
	LKML, Linux Memory Management List, the arch/x86 maintainers

On Tue, 22 Oct 2019 at 14:33, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Oct 17, 2019 at 04:13:04PM +0200, Marco Elver wrote:
> > This adds KCSAN instrumentation to atomic-instrumented.h.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v2:
> > * Use kcsan_check{,_atomic}_{read,write} instead of
> >   kcsan_check_{access,atomic}.
> > * Introduce __atomic_check_{read,write} [Suggested by Mark Rutland].
> > ---
> >  include/asm-generic/atomic-instrumented.h | 393 +++++++++++-----------
> >  scripts/atomic/gen-atomic-instrumented.sh |  17 +-
> >  2 files changed, 218 insertions(+), 192 deletions(-)
>
> The script changes and generated code look fine to me, so FWIW:
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Great, thank you Mark!

> Thanks,
> Mark.
>
> > diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
> > index e09812372b17..8b8b2a6f8d68 100755
> > --- a/scripts/atomic/gen-atomic-instrumented.sh
> > +++ b/scripts/atomic/gen-atomic-instrumented.sh
> > @@ -20,7 +20,7 @@ gen_param_check()
> >       # We don't write to constant parameters
> >       [ ${type#c} != ${type} ] && rw="read"
> >
> > -     printf "\tkasan_check_${rw}(${name}, sizeof(*${name}));\n"
> > +     printf "\t__atomic_check_${rw}(${name}, sizeof(*${name}));\n"
> >  }
> >
> >  #gen_param_check(arg...)
> > @@ -107,7 +107,7 @@ cat <<EOF
> >  #define ${xchg}(ptr, ...)                                            \\
> >  ({                                                                   \\
> >       typeof(ptr) __ai_ptr = (ptr);                                   \\
> > -     kasan_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr));          \\
> > +     __atomic_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr));               \\
> >       arch_${xchg}(__ai_ptr, __VA_ARGS__);                            \\
> >  })
> >  EOF
> > @@ -148,6 +148,19 @@ cat << EOF
> >
> >  #include <linux/build_bug.h>
> >  #include <linux/kasan-checks.h>
> > +#include <linux/kcsan-checks.h>
> > +
> > +static inline void __atomic_check_read(const volatile void *v, size_t size)
> > +{
> > +     kasan_check_read(v, size);
> > +     kcsan_check_atomic_read(v, size);
> > +}
> > +
> > +static inline void __atomic_check_write(const volatile void *v, size_t size)
> > +{
> > +     kasan_check_write(v, size);
> > +     kcsan_check_atomic_write(v, size);
> > +}
> >
> >  EOF
> >
> > --
> > 2.23.0.866.gb869b98d4c-goog
> >

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 7/8] locking/atomics, kcsan: Add KCSAN instrumentation
@ 2019-10-22 18:17       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-22 18:17 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet

On Tue, 22 Oct 2019 at 14:33, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Oct 17, 2019 at 04:13:04PM +0200, Marco Elver wrote:
> > This adds KCSAN instrumentation to atomic-instrumented.h.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v2:
> > * Use kcsan_check{,_atomic}_{read,write} instead of
> >   kcsan_check_{access,atomic}.
> > * Introduce __atomic_check_{read,write} [Suggested by Mark Rutland].
> > ---
> >  include/asm-generic/atomic-instrumented.h | 393 +++++++++++-----------
> >  scripts/atomic/gen-atomic-instrumented.sh |  17 +-
> >  2 files changed, 218 insertions(+), 192 deletions(-)
>
> The script changes and generated code look fine to me, so FWIW:
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Great, thank you Mark!

> Thanks,
> Mark.
>
> > diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
> > index e09812372b17..8b8b2a6f8d68 100755
> > --- a/scripts/atomic/gen-atomic-instrumented.sh
> > +++ b/scripts/atomic/gen-atomic-instrumented.sh
> > @@ -20,7 +20,7 @@ gen_param_check()
> >       # We don't write to constant parameters
> >       [ ${type#c} != ${type} ] && rw="read"
> >
> > -     printf "\tkasan_check_${rw}(${name}, sizeof(*${name}));\n"
> > +     printf "\t__atomic_check_${rw}(${name}, sizeof(*${name}));\n"
> >  }
> >
> >  #gen_param_check(arg...)
> > @@ -107,7 +107,7 @@ cat <<EOF
> >  #define ${xchg}(ptr, ...)                                            \\
> >  ({                                                                   \\
> >       typeof(ptr) __ai_ptr = (ptr);                                   \\
> > -     kasan_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr));          \\
> > +     __atomic_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr));               \\
> >       arch_${xchg}(__ai_ptr, __VA_ARGS__);                            \\
> >  })
> >  EOF
> > @@ -148,6 +148,19 @@ cat << EOF
> >
> >  #include <linux/build_bug.h>
> >  #include <linux/kasan-checks.h>
> > +#include <linux/kcsan-checks.h>
> > +
> > +static inline void __atomic_check_read(const volatile void *v, size_t size)
> > +{
> > +     kasan_check_read(v, size);
> > +     kcsan_check_atomic_read(v, size);
> > +}
> > +
> > +static inline void __atomic_check_write(const volatile void *v, size_t size)
> > +{
> > +     kasan_check_write(v, size);
> > +     kcsan_check_atomic_write(v, size);
> > +}
> >
> >  EOF
> >
> > --
> > 2.23.0.866.gb869b98d4c-goog
> >

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 7/8] locking/atomics, kcsan: Add KCSAN instrumentation
@ 2019-10-22 18:17       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-22 18:17 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi, Linux Kbuild mailing list,
	LKML, Linux Memory Management List, the arch/x86 maintainers

On Tue, 22 Oct 2019 at 14:33, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Oct 17, 2019 at 04:13:04PM +0200, Marco Elver wrote:
> > This adds KCSAN instrumentation to atomic-instrumented.h.
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v2:
> > * Use kcsan_check{,_atomic}_{read,write} instead of
> >   kcsan_check_{access,atomic}.
> > * Introduce __atomic_check_{read,write} [Suggested by Mark Rutland].
> > ---
> >  include/asm-generic/atomic-instrumented.h | 393 +++++++++++-----------
> >  scripts/atomic/gen-atomic-instrumented.sh |  17 +-
> >  2 files changed, 218 insertions(+), 192 deletions(-)
>
> The script changes and generated code look fine to me, so FWIW:
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Great, thank you Mark!

> Thanks,
> Mark.
>
> > diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
> > index e09812372b17..8b8b2a6f8d68 100755
> > --- a/scripts/atomic/gen-atomic-instrumented.sh
> > +++ b/scripts/atomic/gen-atomic-instrumented.sh
> > @@ -20,7 +20,7 @@ gen_param_check()
> >       # We don't write to constant parameters
> >       [ ${type#c} != ${type} ] && rw="read"
> >
> > -     printf "\tkasan_check_${rw}(${name}, sizeof(*${name}));\n"
> > +     printf "\t__atomic_check_${rw}(${name}, sizeof(*${name}));\n"
> >  }
> >
> >  #gen_param_check(arg...)
> > @@ -107,7 +107,7 @@ cat <<EOF
> >  #define ${xchg}(ptr, ...)                                            \\
> >  ({                                                                   \\
> >       typeof(ptr) __ai_ptr = (ptr);                                   \\
> > -     kasan_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr));          \\
> > +     __atomic_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr));               \\
> >       arch_${xchg}(__ai_ptr, __VA_ARGS__);                            \\
> >  })
> >  EOF
> > @@ -148,6 +148,19 @@ cat << EOF
> >
> >  #include <linux/build_bug.h>
> >  #include <linux/kasan-checks.h>
> > +#include <linux/kcsan-checks.h>
> > +
> > +static inline void __atomic_check_read(const volatile void *v, size_t size)
> > +{
> > +     kasan_check_read(v, size);
> > +     kcsan_check_atomic_read(v, size);
> > +}
> > +
> > +static inline void __atomic_check_write(const volatile void *v, size_t size)
> > +{
> > +     kasan_check_write(v, size);
> > +     kcsan_check_atomic_write(v, size);
> > +}
> >
> >  EOF
> >
> > --
> > 2.23.0.866.gb869b98d4c-goog
> >


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-17 14:12   ` Marco Elver
  (?)
@ 2019-10-23  9:41     ` Dmitry Vyukov
  -1 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23  9:41 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

" "On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.

Is it grammatically correct "to stall a delay"? Shouldn't stall be
used with "for"?


> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.

Is it a correct number? 10-15x does not look particularly fast. TSAN
is much faster.
If it's a correct number, perhaps we need to tune the defaults to get
it to a reasonable leve.
Also, what's the minimal level of overhead with infinitely large
sampling? That may be a useful number to provide as well.

> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.

Well, it definitely aware of some of them, e.g. program order :)
It's just not aware of some of the finer details.

> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.

A common term I've seen for this is completeness/incompleteness.
Do we want to mention Soundness? That's more important for any dynamic tool.


> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.

The whole doc is sprinkled with explicit and implicit comparisons with
and diffs on top of a happens-before-based detector (starting from the
very first sentences, KCSAN is effectively defined as being
"not-KTSAN"). This is reasonable for us at this particular point in
time, but it's not so reasonable for most users of this doc and for
future. No happens-before race detector officially exists for kernel.
I would consider adding a separate section for alternative approaches,
rationale and comparison with a happens-before-based detector. Such
section would be a good place to talk about our previous experience
and e.g. shadow memory. Currently the first thing you say about memory
overhead is "No shadow memory is required", and I am like "why are you
even mentioning this? and what is even shadow memory?".

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23  9:41     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23  9:41 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf

" "On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.

Is it grammatically correct "to stall a delay"? Shouldn't stall be
used with "for"?


> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.

Is it a correct number? 10-15x does not look particularly fast. TSAN
is much faster.
If it's a correct number, perhaps we need to tune the defaults to get
it to a reasonable leve.
Also, what's the minimal level of overhead with infinitely large
sampling? That may be a useful number to provide as well.

> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.

Well, it definitely aware of some of them, e.g. program order :)
It's just not aware of some of the finer details.

> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.

A common term I've seen for this is completeness/incompleteness.
Do we want to mention Soundness? That's more important for any dynamic tool.


> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.

The whole doc is sprinkled with explicit and implicit comparisons with
and diffs on top of a happens-before-based detector (starting from the
very first sentences, KCSAN is effectively defined as being
"not-KTSAN"). This is reasonable for us at this particular point in
time, but it's not so reasonable for most users of this doc and for
future. No happens-before race detector officially exists for kernel.
I would consider adding a separate section for alternative approaches,
rationale and comparison with a happens-before-based detector. Such
section would be a good place to talk about our previous experience
and e.g. shadow memory. Currently the first thing you say about memory
overhead is "No shadow memory is required", and I am like "why are you
even mentioning this? and what is even shadow memory?".

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23  9:41     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23  9:41 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

" "On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.

Is it grammatically correct "to stall a delay"? Shouldn't stall be
used with "for"?


> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.

Is it a correct number? 10-15x does not look particularly fast. TSAN
is much faster.
If it's a correct number, perhaps we need to tune the defaults to get
it to a reasonable leve.
Also, what's the minimal level of overhead with infinitely large
sampling? That may be a useful number to provide as well.

> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.

Well, it definitely aware of some of them, e.g. program order :)
It's just not aware of some of the finer details.

> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.

A common term I've seen for this is completeness/incompleteness.
Do we want to mention Soundness? That's more important for any dynamic tool.


> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.

The whole doc is sprinkled with explicit and implicit comparisons with
and diffs on top of a happens-before-based detector (starting from the
very first sentences, KCSAN is effectively defined as being
"not-KTSAN"). This is reasonable for us at this particular point in
time, but it's not so reasonable for most users of this doc and for
future. No happens-before race detector officially exists for kernel.
I would consider adding a separate section for alternative approaches,
rationale and comparison with a happens-before-based detector. Such
section would be a good place to talk about our previous experience
and e.g. shadow memory. Currently the first thing you say about memory
overhead is "No shadow memory is required", and I am like "why are you
even mentioning this? and what is even shadow memory?".


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-17 14:12   ` Marco Elver
  (?)
@ 2019-10-23  9:56     ` Dmitry Vyukov
  -1 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23  9:56 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

.On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);

This cast should not be required, or we should fix kcsan_check_atomic_read.


> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);

This cast should not be required, or we should fix kcsan_check_atomic_write.

>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used

Aren't all functions in this file available always? They should be.

> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23  9:56     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23  9:56 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf

.On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);

This cast should not be required, or we should fix kcsan_check_atomic_read.


> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);

This cast should not be required, or we should fix kcsan_check_atomic_write.

>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used

Aren't all functions in this file available always? They should be.

> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23  9:56     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23  9:56 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

.On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);

This cast should not be required, or we should fix kcsan_check_atomic_read.


> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);

This cast should not be required, or we should fix kcsan_check_atomic_write.

>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used

Aren't all functions in this file available always? They should be.

> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-17 14:12   ` Marco Elver
  (?)
@ 2019-10-23 10:03     ` Dmitry Vyukov
  -1 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 10:03 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.

It can make sense to provide a reference to the doc with full details here.

> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"

All of these configs are already inside of KCSAN submenu, so it's not
necessary to prefix all of them with "KCSAN:".

> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23 10:03     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 10:03 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf

On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.

It can make sense to provide a reference to the doc with full details here.

> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"

All of these configs are already inside of KCSAN submenu, so it's not
necessary to prefix all of them with "KCSAN:".

> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23 10:03     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 10:03 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.

It can make sense to provide a reference to the doc with full details here.

> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"

All of these configs are already inside of KCSAN submenu, so it's not
necessary to prefix all of them with "KCSAN:".

> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-17 14:12   ` Marco Elver
  (?)
@ 2019-10-23 10:09     ` Dmitry Vyukov
  -1 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 10:09 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

/On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories

s/ConcurrencySanitizer/KCSAN/
This is the only mention of "ConcurrencySanitizer" in the whole patch.

> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23 10:09     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 10:09 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf

/On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories

s/ConcurrencySanitizer/KCSAN/
This is the only mention of "ConcurrencySanitizer" in the whole patch.

> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23 10:09     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 10:09 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

/On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories

s/ConcurrencySanitizer/KCSAN/
This is the only mention of "ConcurrencySanitizer" in the whole patch.

> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-17 14:12   ` Marco Elver
  (?)
@ 2019-10-23 10:28     ` Dmitry Vyukov
  -1 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 10:28 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

"))On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,

What will happen if we don't disable it?

> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE

if (CONFIG_ENABLED(CONFIG_KCSAN_EARLY_ENABLE))



> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).

Strictly saying, ThreadSanitizer is never spelled with a space (here
and in one other place).


> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets

"maps to func_addr"


> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23 10:28     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 10:28 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf

"))On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,

What will happen if we don't disable it?

> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE

if (CONFIG_ENABLED(CONFIG_KCSAN_EARLY_ENABLE))



> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).

Strictly saying, ThreadSanitizer is never spelled with a space (here
and in one other place).


> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets

"maps to func_addr"


> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23 10:28     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 10:28 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

"))On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,

What will happen if we don't disable it?

> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE

if (CONFIG_ENABLED(CONFIG_KCSAN_EARLY_ENABLE))



> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).

Strictly saying, ThreadSanitizer is never spelled with a space (here
and in one other place).


> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets

"maps to func_addr"


> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-17 14:12   ` Marco Elver
  (?)
@ 2019-10-23 11:08     ` Dmitry Vyukov
  -1 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 11:08 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

 w?

 On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)

Building with clang 10, I still see:

  CC      kernel/kcsan/core.o
kernel/kcsan/core.o: warning: objtool:
__kcsan_check_watchpoint()+0x228: call to __stack_chk_fail() with
UACCESS enabled
kernel/kcsan/core.o: warning: objtool:
__kcsan_setup_watchpoint()+0x3be: call to __stack_chk_fail() with
UACCESS enabled


> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else

If it's flat, shoudn't we do WARN_ON(get_ctx()->atomic_region_flat)?

> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {

WARN_ON(!get_ctx()->atomic_region_flat)?

> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;

Would be reasonable to return ENOENT to user.

> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);

This can fail.

> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);

This can fail.
Would it be easier to use krealloc? It's usefule to have a cap on list
size anyway.

> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;

EFAULT

> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))

I would be cleaner to use strcmp (trim trailing newline first).
Otherwise we accept anything starting with "on".


> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.

Is "for" missed before KCSAN?

> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16

Increase it to 64 at least. No reason to truncate potentailly useful info.

> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";

in_nmi() will return a wrong thing for the other thread. We either
need to memorize it with the pid, or I would simply always print
"interrupt" b/c nmi/non-nmi is inferrable from the stack if necessary.


> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23 11:08     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 11:08 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf

 w?

 On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)

Building with clang 10, I still see:

  CC      kernel/kcsan/core.o
kernel/kcsan/core.o: warning: objtool:
__kcsan_check_watchpoint()+0x228: call to __stack_chk_fail() with
UACCESS enabled
kernel/kcsan/core.o: warning: objtool:
__kcsan_setup_watchpoint()+0x3be: call to __stack_chk_fail() with
UACCESS enabled


> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else

If it's flat, shoudn't we do WARN_ON(get_ctx()->atomic_region_flat)?

> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {

WARN_ON(!get_ctx()->atomic_region_flat)?

> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;

Would be reasonable to return ENOENT to user.

> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);

This can fail.

> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);

This can fail.
Would it be easier to use krealloc? It's usefule to have a cap on list
size anyway.

> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;

EFAULT

> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))

I would be cleaner to use strcmp (trim trailing newline first).
Otherwise we accept anything starting with "on".


> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.

Is "for" missed before KCSAN?

> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16

Increase it to 64 at least. No reason to truncate potentailly useful info.

> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";

in_nmi() will return a wrong thing for the other thread. We either
need to memorize it with the pid, or I would simply always print
"interrupt" b/c nmi/non-nmi is inferrable from the stack if necessary.


> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23 11:08     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 11:08 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

 w?

 On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)

Building with clang 10, I still see:

  CC      kernel/kcsan/core.o
kernel/kcsan/core.o: warning: objtool:
__kcsan_check_watchpoint()+0x228: call to __stack_chk_fail() with
UACCESS enabled
kernel/kcsan/core.o: warning: objtool:
__kcsan_setup_watchpoint()+0x3be: call to __stack_chk_fail() with
UACCESS enabled


> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else

If it's flat, shoudn't we do WARN_ON(get_ctx()->atomic_region_flat)?

> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {

WARN_ON(!get_ctx()->atomic_region_flat)?

> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;

Would be reasonable to return ENOENT to user.

> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);

This can fail.

> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);

This can fail.
Would it be easier to use krealloc? It's usefule to have a cap on list
size anyway.

> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;

EFAULT

> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))

I would be cleaner to use strcmp (trim trailing newline first).
Otherwise we accept anything starting with "on".


> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.

Is "for" missed before KCSAN?

> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16

Increase it to 64 at least. No reason to truncate potentailly useful info.

> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";

in_nmi() will return a wrong thing for the other thread. We either
need to memorize it with the pid, or I would simply always print
"interrupt" b/c nmi/non-nmi is inferrable from the stack if necessary.


> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-17 14:12   ` Marco Elver
  (?)
@ 2019-10-23 11:20     ` Dmitry Vyukov
  -1 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 11:20 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

)On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \

I think here we need to define the unaligned version as __alias of this one.
Will both make code shorter, reduce icache pressure and eliminate the
need to whitelist them in objtool (currently they are not).

> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23 11:20     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 11:20 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf

)On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \

I think here we need to define the unaligned version as __alias of this one.
Will both make code shorter, reduce icache pressure and eliminate the
need to whitelist them in objtool (currently they are not).

> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23 11:20     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 11:20 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

)On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();
> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();
> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \

I think here we need to define the unaligned version as __alias of this one.
Will both make code shorter, reduce icache pressure and eliminate the
need to whitelist them in objtool (currently they are not).

> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-17 14:12   ` Marco Elver
  (?)
@ 2019-10-23 12:05     ` Dmitry Vyukov
  -1 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 12:05 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();

Why do we need user_access_save?

> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();

Why do we need user_access_save?

> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23 12:05     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 12:05 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf

On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();

Why do we need user_access_save?

> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();

Why do we need user_access_save?

> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23 12:05     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 12:05 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.
>
> This patch adds basic infrastructure, but does not yet enable KCSAN for
> any architecture.
>
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Elaborate comment about instrumentation calls emitted by compilers.
> * Replace kcsan_check_access(.., {true, false}) with
>   kcsan_check_{read,write} for improved readability.
> * Change bug title of race of unknown origin to just say "data-race in".
> * Refine "Key Properties" in kcsan.rst, and mention observed slow-down.
> * Add comment about safety of find_watchpoint without user_access_save.
> * Remove unnecessary preempt_disable/enable and elaborate on comment why
>   we want to disable interrupts and preemptions.
> * Use common struct kcsan_ctx in task_struct and for per-CPU interrupt
>   contexts [Suggested by Mark Rutland].
> ---
>  Documentation/dev-tools/kcsan.rst | 203 ++++++++++++++
>  MAINTAINERS                       |  11 +
>  Makefile                          |   3 +-
>  include/linux/compiler-clang.h    |   9 +
>  include/linux/compiler-gcc.h      |   7 +
>  include/linux/compiler.h          |  35 ++-
>  include/linux/kcsan-checks.h      | 147 ++++++++++
>  include/linux/kcsan.h             | 108 ++++++++
>  include/linux/sched.h             |   4 +
>  init/init_task.c                  |   8 +
>  init/main.c                       |   2 +
>  kernel/Makefile                   |   1 +
>  kernel/kcsan/Makefile             |  14 +
>  kernel/kcsan/atomic.c             |  21 ++
>  kernel/kcsan/core.c               | 428 ++++++++++++++++++++++++++++++
>  kernel/kcsan/debugfs.c            | 225 ++++++++++++++++
>  kernel/kcsan/encoding.h           |  94 +++++++
>  kernel/kcsan/kcsan.c              |  86 ++++++
>  kernel/kcsan/kcsan.h              | 140 ++++++++++
>  kernel/kcsan/report.c             | 306 +++++++++++++++++++++
>  kernel/kcsan/test.c               | 117 ++++++++
>  lib/Kconfig.debug                 |   2 +
>  lib/Kconfig.kcsan                 |  88 ++++++
>  lib/Makefile                      |   3 +
>  scripts/Makefile.kcsan            |   6 +
>  scripts/Makefile.lib              |  10 +
>  26 files changed, 2069 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/dev-tools/kcsan.rst
>  create mode 100644 include/linux/kcsan-checks.h
>  create mode 100644 include/linux/kcsan.h
>  create mode 100644 kernel/kcsan/Makefile
>  create mode 100644 kernel/kcsan/atomic.c
>  create mode 100644 kernel/kcsan/core.c
>  create mode 100644 kernel/kcsan/debugfs.c
>  create mode 100644 kernel/kcsan/encoding.h
>  create mode 100644 kernel/kcsan/kcsan.c
>  create mode 100644 kernel/kcsan/kcsan.h
>  create mode 100644 kernel/kcsan/report.c
>  create mode 100644 kernel/kcsan/test.c
>  create mode 100644 lib/Kconfig.kcsan
>  create mode 100644 scripts/Makefile.kcsan
>
> diff --git a/Documentation/dev-tools/kcsan.rst b/Documentation/dev-tools/kcsan.rst
> new file mode 100644
> index 000000000000..497b09e5cc96
> --- /dev/null
> +++ b/Documentation/dev-tools/kcsan.rst
> @@ -0,0 +1,203 @@
> +The Kernel Concurrency Sanitizer (KCSAN)
> +========================================
> +
> +Overview
> +--------
> +
> +*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data-race detector for
> +kernel space. KCSAN is a sampling watchpoint-based data-race detector -- this
> +is unlike Kernel Thread Sanitizer (KTSAN), which is a happens-before data-race
> +detector. Key priorities in KCSAN's design are lack of false positives,
> +scalability, and simplicity. More details can be found in `Implementation
> +Details`_.
> +
> +KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
> +supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
> +With Clang it requires version 7.0.0 or later.
> +
> +Usage
> +-----
> +
> +To enable KCSAN configure kernel with::
> +
> +    CONFIG_KCSAN = y
> +
> +KCSAN provides several other configuration options to customize behaviour (see
> +their respective help text for more info).
> +
> +debugfs
> +~~~~~~~
> +
> +* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
> +
> +* KCSAN can be turned on or off by writing ``on`` or ``off`` to
> +  ``/sys/kernel/debug/kcsan``.
> +
> +* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
> +  ``some_func_name`` to the report filter list, which (by default) blacklists
> +  reporting data-races where either one of the top stackframes are a function
> +  in the list.
> +
> +* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
> +  changes the report filtering behaviour. For example, the blacklist feature
> +  can be used to silence frequently occurring data-races; the whitelist feature
> +  can help with reproduction and testing of fixes.
> +
> +Error reports
> +~~~~~~~~~~~~~
> +
> +A typical data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
> +
> +    write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
> +     kernfs_refresh_inode+0x70/0x170
> +     kernfs_iop_permission+0x4f/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     vfs_statx+0x9b/0x130
> +     __do_sys_newlstat+0x50/0xb0
> +     __x64_sys_newlstat+0x37/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
> +     generic_permission+0x5b/0x2a0
> +     kernfs_iop_permission+0x66/0x90
> +     inode_permission+0x190/0x200
> +     link_path_walk.part.0+0x503/0x8e0
> +     path_lookupat.isra.0+0x69/0x4d0
> +     filename_lookup+0x136/0x280
> +     user_path_at_empty+0x47/0x60
> +     do_faccessat+0x11a/0x390
> +     __x64_sys_access+0x3c/0x50
> +     do_syscall_64+0x85/0x260
> +     entry_SYSCALL_64_after_hwframe+0x44/0xa9
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +The header of the report provides a short summary of the functions involved in
> +the race. It is followed by the access types and stack traces of the 2 threads
> +involved in the data-race.
> +
> +The other less common type of data-race report looks like this::
> +
> +    ==================================================================
> +    BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
> +
> +    race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
> +     e1000_clean_rx_irq+0x551/0xb10
> +     e1000_clean+0x533/0xda0
> +     net_rx_action+0x329/0x900
> +     __do_softirq+0xdb/0x2db
> +     irq_exit+0x9b/0xa0
> +     do_IRQ+0x9c/0xf0
> +     ret_from_intr+0x0/0x18
> +     default_idle+0x3f/0x220
> +     arch_cpu_idle+0x21/0x30
> +     do_idle+0x1df/0x230
> +     cpu_startup_entry+0x14/0x20
> +     rest_init+0xc5/0xcb
> +     arch_call_rest_init+0x13/0x2b
> +     start_kernel+0x6db/0x700
> +
> +    Reported by Kernel Concurrency Sanitizer on:
> +    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
> +    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> +    ==================================================================
> +
> +This report is generated where it was not possible to determine the other
> +racing thread, but a race was inferred due to the data-value of the watched
> +memory location having changed. These can occur either due to missing
> +instrumentation or e.g. DMA accesses.
> +
> +Data-Races
> +----------
> +
> +Informally, two operations *conflict* if they access the same memory location,
> +and at least one of them is a write operation. In an execution, two memory
> +operations from different threads form a **data-race** if they *conflict*, at
> +least one of them is a *plain access* (non-atomic), and they are *unordered* in
> +the "happens-before" order according to the `LKMM
> +<../../tools/memory-model/Documentation/explanation.txt>`_.
> +
> +Relationship with the Linux Kernel Memory Model (LKMM)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The LKMM defines the propagation and ordering rules of various memory
> +operations, which gives developers the ability to reason about concurrent code.
> +Ultimately this allows to determine the possible executions of concurrent code,
> +and if that code is free from data-races.
> +
> +KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
> +``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
> +words, KCSAN assumes that as long as a plain access is not observed to race
> +with another conflicting access, memory operations are correctly ordered.
> +
> +This means that KCSAN will not report *potential* data-races due to missing
> +memory ordering. If, however, missing memory ordering (that is observable with
> +a particular compiler and architecture) leads to an observable data-race (e.g.
> +entering a critical section erroneously), KCSAN would report the resulting
> +data-race.
> +
> +Implementation Details
> +----------------------
> +
> +The general approach is inspired by `DataCollider
> +<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
> +Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
> +relies on compiler instrumentation. Watchpoints are implemented using an
> +efficient encoding that stores access type, size, and address in a long; the
> +benefits of using "soft watchpoints" are portability and greater flexibility in
> +limiting which accesses trigger a watchpoint.
> +
> +More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
> +memory operations; for each instrumented plain access:
> +
> +1. Check if a matching watchpoint exists; if yes, and at least one access is a
> +   write, then we encountered a racing access.
> +
> +2. Periodically, if no matching watchpoint exists, set up a watchpoint and
> +   stall some delay.
> +
> +3. Also check the data value before the delay, and re-check the data value
> +   after delay; if the values mismatch, we infer a race of unknown origin.
> +
> +To detect data-races between plain and atomic memory operations, KCSAN also
> +annotates atomic accesses, but only to check if a watchpoint exists
> +(``kcsan_check_atomic_*``); i.e.  KCSAN never sets up a watchpoint on atomic
> +accesses.
> +
> +Key Properties
> +~~~~~~~~~~~~~~
> +
> +1. **Memory Overhead:** No shadow memory is required. The current
> +   implementation uses a small array of longs to encode watchpoint information,
> +   which is negligible.
> +
> +2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
> +   efficient watchpoint encoding that does not require acquiring any shared
> +   locks in the fast-path. For kernel boot with a default config on a system
> +   where nproc=8 we measure a slow-down of 10-15x.
> +
> +3. **Memory Ordering:** KCSAN is *not* aware of the LKMM's ordering rules. This
> +   may result in missed data-races (false negatives), compared to a
> +   happens-before data-race detector.
> +
> +4. **Accuracy:** Imprecise, since it uses a sampling strategy.
> +
> +5. **Annotation Overheads:** Minimal annotation is required outside the KCSAN
> +   runtime. With a happens-before data-race detector, any omission leads to
> +   false positives, which is especially important in the context of the kernel
> +   which includes numerous custom synchronization mechanisms. With KCSAN, as a
> +   result, maintenance overheads are minimal as the kernel evolves.
> +
> +6. **Detects Racy Writes from Devices:** Due to checking data values upon
> +   setting up watchpoints, racy writes from devices can also be detected.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0154674cbad3..71f7fb625490 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8847,6 +8847,17 @@ F:       Documentation/kbuild/kconfig*
>  F:     scripts/kconfig/
>  F:     scripts/Kconfig.include
>
> +KCSAN
> +M:     Marco Elver <elver@google.com>
> +R:     Dmitry Vyukov <dvyukov@google.com>
> +L:     kasan-dev@googlegroups.com
> +S:     Maintained
> +F:     Documentation/dev-tools/kcsan.rst
> +F:     include/linux/kcsan*.h
> +F:     kernel/kcsan/
> +F:     lib/Kconfig.kcsan
> +F:     scripts/Makefile.kcsan
> +
>  KDUMP
>  M:     Dave Young <dyoung@redhat.com>
>  M:     Baoquan He <bhe@redhat.com>
> diff --git a/Makefile b/Makefile
> index ffd7a912fc46..ad4729176252 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
>
>  export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
>  export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
> -export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
> +export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
>  export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
>  export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
>  export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
> @@ -900,6 +900,7 @@ endif
>  include scripts/Makefile.kasan
>  include scripts/Makefile.extrawarn
>  include scripts/Makefile.ubsan
> +include scripts/Makefile.kcsan
>
>  # Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
>  KBUILD_CPPFLAGS += $(KCPPFLAGS)
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 333a6695a918..a213eb55e725 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,15 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(thread_sanitizer)
> +/* emulate gcc's __SANITIZE_THREAD__ flag */
> +#define __SANITIZE_THREAD__
> +#define __no_sanitize_thread \
> +               __attribute__((no_sanitize("thread")))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d7ee4c6bad48..de105ca29282 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_thread__) && defined(__SANITIZE_THREAD__)
> +#define __no_sanitize_thread                                                   \
> +       __attribute__((__noinline__)) __attribute__((no_sanitize_thread))
> +#else
> +#define __no_sanitize_thread
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 5e88e7e33abe..350d80dbee4d 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>  #endif
>
>  #include <uapi/linux/types.h>
> +#include <linux/kcsan-checks.h>
>
>  #define __READ_ONCE_SIZE                                               \
>  ({                                                                     \
> @@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>         }                                                               \
>  })
>
> -static __always_inline
> -void __read_once_size(const volatile void *p, void *res, int size)
> -{
> -       __READ_ONCE_SIZE;
> -}
> -
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -211,14 +206,38 @@ void __read_once_size(const volatile void *p, void *res, int size)
>  # define __no_kasan_or_inline __always_inline
>  #endif
>
> -static __no_kasan_or_inline
> +#ifdef CONFIG_KCSAN
> +# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
> +#else
> +# define __no_kcsan_or_inline __always_inline
> +#endif
> +
> +#if defined(CONFIG_KASAN) || defined(CONFIG_KCSAN)
> +/* Avoid any instrumentation or inline. */
> +#define __no_sanitize_or_inline                                                \
> +       __no_sanitize_address __no_sanitize_thread notrace __maybe_unused
> +#else
> +#define __no_sanitize_or_inline __always_inline
> +#endif
> +
> +static __no_kcsan_or_inline
> +void __read_once_size(const volatile void *p, void *res, int size)
> +{
> +       kcsan_check_atomic_read((const void *)p, size);
> +       __READ_ONCE_SIZE;
> +}
> +
> +static __no_sanitize_or_inline
>  void __read_once_size_nocheck(const volatile void *p, void *res, int size)
>  {
>         __READ_ONCE_SIZE;
>  }
>
> -static __always_inline void __write_once_size(volatile void *p, void *res, int size)
> +static __no_kcsan_or_inline
> +void __write_once_size(volatile void *p, void *res, int size)
>  {
> +       kcsan_check_atomic_write((const void *)p, size);
> +
>         switch (size) {
>         case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
>         case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
> diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
> new file mode 100644
> index 000000000000..4203603ae852
> --- /dev/null
> +++ b/include/linux/kcsan-checks.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_CHECKS_H
> +#define _LINUX_KCSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * __kcsan_*: Always available when KCSAN is enabled. This may be used
> + * even in compilation units that selectively disable KCSAN, but must use KCSAN
> + * to validate access to an address.   Never use these in header files!
> + */
> +#ifdef CONFIG_KCSAN
> +/**
> + * __kcsan_check_watchpoint - check if a watchpoint exists
> + *
> + * Returns true if no race was detected, and we may then proceed to set up a
> + * watchpoint after. Returns false if either KCSAN is disabled or a race was
> + * encountered, and we may not set up a watchpoint after.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + * @return true if no race was detected, false otherwise.
> + */
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +
> +/**
> + * __kcsan_setup_watchpoint - set up watchpoint and report data-races
> + *
> + * Sets up a watchpoint (if sampled), and if a racing access was observed,
> + * reports the data-race.
> + *
> + * @ptr address of access
> + * @size size of access
> + * @is_write is access a write
> + */
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write);
> +#else
> +static inline bool __kcsan_check_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +       return true;
> +}
> +static inline void __kcsan_setup_watchpoint(const volatile void *ptr,
> +                                           size_t size, bool is_write)
> +{
> +}
> +#endif
> +
> +/*
> + * kcsan_*: Only available when the particular compilation unit has KCSAN
> + * instrumentation enabled. May be used in header files.
> + */
> +#ifdef __SANITIZE_THREAD__
> +#define kcsan_check_watchpoint __kcsan_check_watchpoint
> +#define kcsan_setup_watchpoint __kcsan_setup_watchpoint
> +#else
> +static inline bool kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +       return true;
> +}
> +static inline void kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                                         bool is_write)
> +{
> +}
> +#endif
> +
> +/**
> + * __kcsan_check_read - check regular read access for data-races
> + *
> + * Full read access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled. Note that, setting up watchpoints for plain reads is
> + * required to also detect data-races with atomic accesses.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_read(ptr, size)                                          \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, false))                \
> +                       __kcsan_setup_watchpoint(ptr, size, false);            \
> +       } while (0)
> +
> +/**
> + * __kcsan_check_write - check regular write access for data-races
> + *
> + * Full write access that checks watchpoint and sets up a watchpoint if this
> + * access is sampled.
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define __kcsan_check_write(ptr, size)                                         \
> +       do {                                                                   \
> +               if (__kcsan_check_watchpoint(ptr, size, true) &&               \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       __kcsan_setup_watchpoint(ptr, size, true);             \
> +       } while (0)
> +
> +/**
> + * kcsan_check_read - check regular read access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_read(ptr, size)                                            \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, false))                  \
> +                       kcsan_setup_watchpoint(ptr, size, false);              \
> +       } while (0)
> +
> +/**
> + * kcsan_check_write - check regular write access for data-races
> + *
> + * @ptr address of access
> + * @size size of access
> + */
> +#define kcsan_check_write(ptr, size)                                           \
> +       do {                                                                   \
> +               if (kcsan_check_watchpoint(ptr, size, true) &&                 \
> +                   !IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE))        \
> +                       kcsan_setup_watchpoint(ptr, size, true);               \
> +       } while (0)
> +
> +/*
> + * Check for atomic accesses: if atomic access are not ignored, this simply
> + * aliases to kcsan_check_watchpoint, otherwise becomes a no-op.
> + */
> +#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
> +#define kcsan_check_atomic_read(...)                                           \
> +       do {                                                                   \
> +       } while (0)
> +#define kcsan_check_atomic_write(...)                                          \
> +       do {                                                                   \
> +       } while (0)
> +#else
> +#define kcsan_check_atomic_read(ptr, size)                                     \
> +       kcsan_check_watchpoint(ptr, size, false)
> +#define kcsan_check_atomic_write(ptr, size)                                    \
> +       kcsan_check_watchpoint(ptr, size, true)
> +#endif
> +
> +#endif /* _LINUX_KCSAN_CHECKS_H */
> diff --git a/include/linux/kcsan.h b/include/linux/kcsan.h
> new file mode 100644
> index 000000000000..fd5de2ba3a16
> --- /dev/null
> +++ b/include/linux/kcsan.h
> @@ -0,0 +1,108 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _LINUX_KCSAN_H
> +#define _LINUX_KCSAN_H
> +
> +#include <linux/types.h>
> +#include <linux/kcsan-checks.h>
> +
> +#ifdef CONFIG_KCSAN
> +
> +/*
> + * Context for each thread of execution: for tasks, this is stored in
> + * task_struct, and interrupts access internal per-CPU storage.
> + */
> +struct kcsan_ctx {
> +       int disable; /* disable counter */
> +       int atomic_next; /* number of following atomic ops */
> +
> +       /*
> +        * We use separate variables to store if we are in a nestable or flat
> +        * atomic region. This helps make sure that an atomic region with
> +        * nesting support is not suddenly aborted when a flat region is
> +        * contained within. Effectively this allows supporting nesting flat
> +        * atomic regions within an outer nestable atomic region. Support for
> +        * this is required as there are cases where a seqlock reader critical
> +        * section (flat atomic region) is contained within a seqlock writer
> +        * critical section (nestable atomic region), and the "mismatching
> +        * kcsan_end_atomic()" warning would trigger otherwise.
> +        */
> +       int atomic_region;
> +       bool atomic_region_flat;
> +};
> +
> +/**
> + * kcsan_init - initialize KCSAN runtime
> + */
> +void kcsan_init(void);
> +
> +/**
> + * kcsan_disable_current - disable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_disable_current(void);
> +
> +/**
> + * kcsan_enable_current - re-enable KCSAN for the current context
> + *
> + * Supports nesting.
> + */
> +void kcsan_enable_current(void);
> +
> +/**
> + * kcsan_begin_atomic - use to denote an atomic region
> + *
> + * Accesses within the atomic region may appear to race with other accesses but
> + * should be considered atomic.
> + *
> + * @nest true if regions may be nested, or false for flat region
> + */
> +void kcsan_begin_atomic(bool nest);
> +
> +/**
> + * kcsan_end_atomic - end atomic region
> + *
> + * @nest must match argument to kcsan_begin_atomic().
> + */
> +void kcsan_end_atomic(bool nest);
> +
> +/**
> + * kcsan_atomic_next - consider following accesses as atomic
> + *
> + * Force treating the next n memory accesses for the current context as atomic
> + * operations.
> + *
> + * @n number of following memory accesses to treat as atomic.
> + */
> +void kcsan_atomic_next(int n);
> +
> +#else /* CONFIG_KCSAN */
> +
> +static inline void kcsan_init(void)
> +{
> +}
> +
> +static inline void kcsan_disable_current(void)
> +{
> +}
> +
> +static inline void kcsan_enable_current(void)
> +{
> +}
> +
> +static inline void kcsan_begin_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_end_atomic(bool nest)
> +{
> +}
> +
> +static inline void kcsan_atomic_next(int n)
> +{
> +}
> +
> +#endif /* CONFIG_KCSAN */
> +
> +#endif /* _LINUX_KCSAN_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..9490e417bf4a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -31,6 +31,7 @@
>  #include <linux/task_io_accounting.h>
>  #include <linux/posix-timers.h>
>  #include <linux/rseq.h>
> +#include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
>  struct audit_context;
> @@ -1171,6 +1172,9 @@ struct task_struct {
>  #ifdef CONFIG_KASAN
>         unsigned int                    kasan_depth;
>  #endif
> +#ifdef CONFIG_KCSAN
> +       struct kcsan_ctx                kcsan_ctx;
> +#endif
>
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>         /* Index of current stored address in ret_stack: */
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5eab7b..e229416c3314 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -161,6 +161,14 @@ struct task_struct init_task
>  #ifdef CONFIG_KASAN
>         .kasan_depth    = 1,
>  #endif
> +#ifdef CONFIG_KCSAN
> +       .kcsan_ctx = {
> +               .disable                = 1,
> +               .atomic_next            = 0,
> +               .atomic_region          = 0,
> +               .atomic_region_flat     = 0,
> +       },
> +#endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>         .softirqs_enabled = 1,
>  #endif
> diff --git a/init/main.c b/init/main.c
> index 91f6ebb30ef0..4d814de017ee 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -93,6 +93,7 @@
>  #include <linux/rodata_test.h>
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kcsan.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
>         acpi_subsystem_init();
>         arch_post_acpi_subsys_init();
>         sfi_init_late();
> +       kcsan_init();
>
>         /* Do the rest non-__init'ed, we're now alive */
>         arch_call_rest_init();
> diff --git a/kernel/Makefile b/kernel/Makefile
> index daad787fb795..74ab46e2ebd1 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
>  obj-$(CONFIG_IRQ_WORK) += irq_work.o
>  obj-$(CONFIG_CPU_PM) += cpu_pm.o
>  obj-$(CONFIG_BPF) += bpf/
> +obj-$(CONFIG_KCSAN) += kcsan/
>
>  obj-$(CONFIG_PERF_EVENTS) += events/
>
> diff --git a/kernel/kcsan/Makefile b/kernel/kcsan/Makefile
> new file mode 100644
> index 000000000000..c25f07062d26
> --- /dev/null
> +++ b/kernel/kcsan/Makefile
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +KCSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +
> +CFLAGS_REMOVE_kcsan.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_atomic.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_kcsan.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_core.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +CFLAGS_atomic.o = $(call cc-option, -fno-conserve-stack -fno-stack-protector)
> +
> +obj-y := kcsan.o core.o atomic.o debugfs.o report.o
> +obj-$(CONFIG_KCSAN_SELFTEST) += test.o
> diff --git a/kernel/kcsan/atomic.c b/kernel/kcsan/atomic.c
> new file mode 100644
> index 000000000000..dd44f7d9e491
> --- /dev/null
> +++ b/kernel/kcsan/atomic.c
> @@ -0,0 +1,21 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/jiffies.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * List all volatile globals that have been observed in races, to suppress
> + * data-race reports between accesses to these variables.
> + *
> + * For now, we assume that volatile accesses of globals are as strong as atomic
> + * accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
> + * entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
> + * than cast to volatile. Eventually, we hope to be able to remove this
> + * function.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr)
> +{
> +       /* only jiffies for now */
> +       return ptr == &jiffies;
> +}
> diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> new file mode 100644
> index 000000000000..bc8d60b129eb
> --- /dev/null
> +++ b/kernel/kcsan/core.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bug.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/init.h>
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/random.h>
> +#include <linux/sched.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Helper macros to iterate slots, starting from address slot itself, followed
> + * by the right and left slots.
> + */
> +#define CHECK_NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
> +#define SLOT_IDX(slot, i)                                                      \
> +       ((slot + (((i + KCSAN_CHECK_ADJACENT) % CHECK_NUM_SLOTS) -             \
> +                 KCSAN_CHECK_ADJACENT)) %                                     \
> +        KCSAN_NUM_WATCHPOINTS)
> +
> +bool kcsan_enabled;
> +
> +/* Per-CPU kcsan_ctx for interrupts */
> +static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
> +       .disable = 0,
> +       .atomic_next = 0,
> +       .atomic_region = 0,
> +       .atomic_region_flat = 0,
> +};
> +
> +/*
> + * Watchpoints, with each entry encoded as defined in encoding.h: in order to be
> + * able to safely update and access a watchpoint without introducing locking
> + * overhead, we encode each watchpoint as a single atomic long. The initial
> + * zero-initialized state matches INVALID_WATCHPOINT.
> + */
> +static atomic_long_t watchpoints[KCSAN_NUM_WATCHPOINTS];
> +
> +/*
> + * Instructions skipped counter; see should_watch().
> + */
> +static DEFINE_PER_CPU(unsigned long, kcsan_skip);
> +
> +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
> +                                            bool expect_write,
> +                                            long *encoded_watchpoint)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
> +       atomic_long_t *watchpoint;
> +       unsigned long wp_addr_masked;
> +       size_t wp_size;
> +       bool is_write;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               *encoded_watchpoint = atomic_long_read(watchpoint);
> +               if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
> +                                      &wp_size, &is_write))
> +                       continue;
> +
> +               if (expect_write && !is_write)
> +                       continue;
> +
> +               /* Check if the watchpoint matches the access. */
> +               if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
> +                                              bool is_write)
> +{
> +       const int slot = watchpoint_slot(addr);
> +       const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
> +       atomic_long_t *watchpoint;
> +       int i;
> +
> +       for (i = 0; i < CHECK_NUM_SLOTS; ++i) {
> +               long expect_val = INVALID_WATCHPOINT;
> +
> +               /* Try to acquire this slot. */
> +               watchpoint = &watchpoints[SLOT_IDX(slot, i)];
> +               if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
> +                                                   encoded_watchpoint))
> +                       return watchpoint;
> +       }
> +
> +       return NULL;
> +}
> +
> +/*
> + * Return true if watchpoint was successfully consumed, false otherwise.
> + *
> + * This may return false if:
> + *
> + *     1. another thread already consumed the watchpoint;
> + *     2. the thread that set up the watchpoint already removed it;
> + *     3. the watchpoint was removed and then re-used.
> + */
> +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
> +                                         long encoded_watchpoint)
> +{
> +       return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
> +                                              CONSUMED_WATCHPOINT);
> +}
> +
> +/*
> + * Return true if watchpoint was not touched, false if consumed.
> + */
> +static inline bool remove_watchpoint(atomic_long_t *watchpoint)
> +{
> +       return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
> +              CONSUMED_WATCHPOINT;
> +}
> +
> +static inline struct kcsan_ctx *get_ctx(void)
> +{
> +       /*
> +        * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
> +        * also result in calls that generate warnings in uaccess regions.
> +        */
> +       return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
> +}
> +
> +static inline bool is_atomic(const volatile void *ptr)
> +{
> +       struct kcsan_ctx *ctx = get_ctx();
> +
> +       if (unlikely(ctx->atomic_next > 0)) {
> +               --ctx->atomic_next;
> +               return true;
> +       }
> +       if (unlikely(ctx->atomic_region > 0 || ctx->atomic_region_flat))
> +               return true;
> +
> +       return kcsan_is_atomic(ptr);
> +}
> +
> +static inline bool should_watch(const volatile void *ptr)
> +{
> +       /*
> +        * Never set up watchpoints when memory operations are atomic.
> +        *
> +        * We need to check this first, because: 1) atomics should not count
> +        * towards skipped instructions below, and 2) to actually decrement
> +        * kcsan_atomic_next for each atomic.
> +        */
> +       if (is_atomic(ptr))
> +               return false;
> +
> +       /*
> +        * We use a per-CPU counter, to avoid excessive contention; there is
> +        * still enough non-determinism for the precise instructions that end up
> +        * being watched to be mostly unpredictable. Using a PRNG like
> +        * prandom_u32() turned out to be too slow.
> +        */
> +       return (this_cpu_inc_return(kcsan_skip) %
> +               CONFIG_KCSAN_WATCH_SKIP_INST) == 0;
> +}
> +
> +static inline bool is_enabled(void)
> +{
> +       return READ_ONCE(kcsan_enabled) && get_ctx()->disable == 0;
> +}
> +
> +static inline unsigned int get_delay(void)
> +{
> +       unsigned int max_delay = in_task() ? CONFIG_KCSAN_UDELAY_MAX_TASK :
> +                                            CONFIG_KCSAN_UDELAY_MAX_INTERRUPT;
> +       return IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
> +                      ((prandom_u32() % max_delay) + 1) :
> +                      max_delay;
> +}
> +
> +/* === Public interface ===================================================== */
> +
> +void __init kcsan_init(void)
> +{
> +       BUG_ON(!in_task());
> +
> +       kcsan_debugfs_init();
> +       kcsan_enable_current();
> +#ifdef CONFIG_KCSAN_EARLY_ENABLE
> +       /*
> +        * We are in the init task, and no other tasks should be running.
> +        */
> +       WRITE_ONCE(kcsan_enabled, true);
> +#endif
> +}
> +
> +/* === Exported interface =================================================== */
> +
> +void kcsan_disable_current(void)
> +{
> +       ++get_ctx()->disable;
> +}
> +EXPORT_SYMBOL(kcsan_disable_current);
> +
> +void kcsan_enable_current(void)
> +{
> +       if (get_ctx()->disable-- == 0) {
> +               kcsan_disable_current(); /* restore to 0 */
> +               kcsan_disable_current();
> +               WARN(1, "mismatching %s", __func__);
> +               kcsan_enable_current();
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_enable_current);
> +
> +void kcsan_begin_atomic(bool nest)
> +{
> +       if (nest)
> +               ++get_ctx()->atomic_region;
> +       else
> +               get_ctx()->atomic_region_flat = true;
> +}
> +EXPORT_SYMBOL(kcsan_begin_atomic);
> +
> +void kcsan_end_atomic(bool nest)
> +{
> +       if (nest) {
> +               if (get_ctx()->atomic_region-- == 0) {
> +                       kcsan_begin_atomic(true); /* restore to 0 */
> +                       kcsan_disable_current();
> +                       WARN(1, "mismatching %s", __func__);
> +                       kcsan_enable_current();
> +               }
> +       } else {
> +               get_ctx()->atomic_region_flat = false;
> +       }
> +}
> +EXPORT_SYMBOL(kcsan_end_atomic);
> +
> +void kcsan_atomic_next(int n)
> +{
> +       get_ctx()->atomic_next = n;
> +}
> +EXPORT_SYMBOL(kcsan_atomic_next);
> +
> +bool __kcsan_check_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       long encoded_watchpoint;
> +       unsigned long flags;
> +       enum kcsan_report_type report_type;
> +
> +       if (unlikely(!is_enabled()))
> +               return false;
> +
> +       /*
> +        * Avoid user_access_save in fast-path here: find_watchpoint is safe
> +        * without user_access_save, as the address that ptr points to is only
> +        * used to check if a watchpoint exists; ptr is never dereferenced.
> +        */
> +       watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
> +                                    &encoded_watchpoint);
> +       if (watchpoint == NULL)
> +               return true;
> +
> +       flags = user_access_save();

Why do we need user_access_save?

> +       if (!try_consume_watchpoint(watchpoint, encoded_watchpoint)) {
> +               /*
> +                * The other thread may not print any diagnostics, as it has
> +                * already removed the watchpoint, or another thread consumed
> +                * the watchpoint before this thread.
> +                */
> +               kcsan_counter_inc(kcsan_counter_report_races);
> +               report_type = kcsan_report_race_check_race;
> +       } else {
> +               report_type = kcsan_report_race_check;
> +       }
> +
> +       /* Encountered a data-race. */
> +       kcsan_counter_inc(kcsan_counter_data_races);
> +       kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type);
> +
> +       user_access_restore(flags);
> +       return false;
> +}
> +EXPORT_SYMBOL(__kcsan_check_watchpoint);
> +
> +void __kcsan_setup_watchpoint(const volatile void *ptr, size_t size,
> +                             bool is_write)
> +{
> +       atomic_long_t *watchpoint;
> +       union {
> +               u8 _1;
> +               u16 _2;
> +               u32 _4;
> +               u64 _8;
> +       } expect_value;
> +       bool is_expected = true;
> +       unsigned long ua_flags = user_access_save();

Why do we need user_access_save?

> +       unsigned long irq_flags;
> +
> +       if (!should_watch(ptr))
> +               goto out;
> +
> +       if (!check_encodable((unsigned long)ptr, size)) {
> +               kcsan_counter_inc(kcsan_counter_unencodable_accesses);
> +               goto out;
> +       }
> +
> +       /*
> +        * Disable interrupts & preemptions to avoid another thread on the same
> +        * CPU accessing memory locations for the set up watchpoint; this is to
> +        * avoid reporting races to e.g. CPU-local data.
> +        *
> +        * An alternative would be adding the source CPU to the watchpoint
> +        * encoding, and checking that watchpoint-CPU != this-CPU. There are
> +        * several problems with this:
> +        *   1. we should avoid stealing more bits from the watchpoint encoding
> +        *      as it would affect accuracy, as well as increase performance
> +        *      overhead in the fast-path;
> +        *   2. if we are preempted, but there *is* a genuine data-race, we
> +        *      would *not* report it -- since this is the common case (vs.
> +        *      CPU-local data accesses), it makes more sense (from a data-race
> +        *      detection PoV) to simply disable preemptions to ensure as many
> +        *      tasks as possible run on other CPUs.
> +        */
> +       local_irq_save(irq_flags);
> +
> +       watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
> +       if (watchpoint == NULL) {
> +               /*
> +                * Out of capacity: the size of `watchpoints`, and the frequency
> +                * with which `should_watch()` returns true should be tweaked so
> +                * that this case happens very rarely.
> +                */
> +               kcsan_counter_inc(kcsan_counter_no_capacity);
> +               goto out_unlock;
> +       }
> +
> +       kcsan_counter_inc(kcsan_counter_setup_watchpoints);
> +       kcsan_counter_inc(kcsan_counter_used_watchpoints);
> +
> +       /*
> +        * Read the current value, to later check and infer a race if the data
> +        * was modified via a non-instrumented access, e.g. from a device.
> +        */
> +       switch (size) {
> +       case 1:
> +               expect_value._1 = READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               expect_value._2 = READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               expect_value._4 = READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               expect_value._8 = READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +#ifdef CONFIG_KCSAN_DEBUG
> +       kcsan_disable_current();
> +       pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
> +              is_write ? "write" : "read", size, ptr,
> +              watchpoint_slot((unsigned long)ptr),
> +              encode_watchpoint((unsigned long)ptr, size, is_write));
> +       kcsan_enable_current();
> +#endif
> +
> +       /*
> +        * Delay this thread, to increase probability of observing a racy
> +        * conflicting access.
> +        */
> +       udelay(get_delay());
> +
> +       /*
> +        * Re-read value, and check if it is as expected; if not, we infer a
> +        * racy access.
> +        */
> +       switch (size) {
> +       case 1:
> +               is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr);
> +               break;
> +       case 2:
> +               is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr);
> +               break;
> +       case 4:
> +               is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr);
> +               break;
> +       case 8:
> +               is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr);
> +               break;
> +       default:
> +               break; /* ignore; we do not diff the values */
> +       }
> +
> +       /* Check if this access raced with another. */
> +       if (!remove_watchpoint(watchpoint)) {
> +               /*
> +                * No need to increment 'race' counter, as the racing thread
> +                * already did.
> +                */
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_setup);
> +       } else if (!is_expected) {
> +               /* Inferring a race, since the value should not have changed. */
> +               kcsan_counter_inc(kcsan_counter_races_unknown_origin);
> +#ifdef CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +               kcsan_report(ptr, size, is_write, smp_processor_id(),
> +                            kcsan_report_race_unknown_origin);
> +#endif
> +       }
> +
> +       kcsan_counter_dec(kcsan_counter_used_watchpoints);
> +out_unlock:
> +       local_irq_restore(irq_flags);
> +out:
> +       user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__kcsan_setup_watchpoint);
> diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c
> new file mode 100644
> index 000000000000..6ddcbd185f3a
> --- /dev/null
> +++ b/kernel/kcsan/debugfs.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/atomic.h>
> +#include <linux/bsearch.h>
> +#include <linux/bug.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/kallsyms.h>
> +#include <linux/mm.h>
> +#include <linux/seq_file.h>
> +#include <linux/sort.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * Statistics counters.
> + */
> +static atomic_long_t counters[kcsan_counter_count];
> +
> +/*
> + * Addresses for filtering functions from reporting. This list can be used as a
> + * whitelist or blacklist.
> + */
> +static struct {
> +       unsigned long *addrs; /* array of addresses */
> +       size_t size; /* current size */
> +       int used; /* number of elements used */
> +       bool sorted; /* if elements are sorted */
> +       bool whitelist; /* if list is a blacklist or whitelist */
> +} report_filterlist = {
> +       .addrs = NULL,
> +       .size = 8, /* small initial size */
> +       .used = 0,
> +       .sorted = false,
> +       .whitelist = false, /* default is blacklist */
> +};
> +static DEFINE_SPINLOCK(report_filterlist_lock);
> +
> +static const char *counter_to_name(enum kcsan_counter_id id)
> +{
> +       switch (id) {
> +       case kcsan_counter_used_watchpoints:
> +               return "used_watchpoints";
> +       case kcsan_counter_setup_watchpoints:
> +               return "setup_watchpoints";
> +       case kcsan_counter_data_races:
> +               return "data_races";
> +       case kcsan_counter_no_capacity:
> +               return "no_capacity";
> +       case kcsan_counter_report_races:
> +               return "report_races";
> +       case kcsan_counter_races_unknown_origin:
> +               return "races_unknown_origin";
> +       case kcsan_counter_unencodable_accesses:
> +               return "unencodable_accesses";
> +       case kcsan_counter_encoding_false_positives:
> +               return "encoding_false_positives";
> +       case kcsan_counter_count:
> +               BUG();
> +       }
> +       return NULL;
> +}
> +
> +void kcsan_counter_inc(enum kcsan_counter_id id)
> +{
> +       atomic_long_inc(&counters[id]);
> +}
> +
> +void kcsan_counter_dec(enum kcsan_counter_id id)
> +{
> +       atomic_long_dec(&counters[id]);
> +}
> +
> +static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
> +{
> +       const unsigned long a = *(const unsigned long *)rhs;
> +       const unsigned long b = *(const unsigned long *)lhs;
> +
> +       return a < b ? -1 : a == b ? 0 : 1;
> +}
> +
> +bool kcsan_skip_report(unsigned long func_addr)
> +{
> +       unsigned long symbolsize, offset;
> +       unsigned long flags;
> +       bool ret = false;
> +
> +       if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
> +               return false;
> +       func_addr -= offset; /* get function start */
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       if (report_filterlist.used == 0)
> +               goto out;
> +
> +       /* Sort array if it is unsorted, and then do a binary search. */
> +       if (!report_filterlist.sorted) {
> +               sort(report_filterlist.addrs, report_filterlist.used,
> +                    sizeof(unsigned long), cmp_filterlist_addrs, NULL);
> +               report_filterlist.sorted = true;
> +       }
> +       ret = !!bsearch(&func_addr, report_filterlist.addrs,
> +                       report_filterlist.used, sizeof(unsigned long),
> +                       cmp_filterlist_addrs);
> +       if (report_filterlist.whitelist)
> +               ret = !ret;
> +
> +out:
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +       return ret;
> +}
> +
> +static void set_report_filterlist_whitelist(bool whitelist)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       report_filterlist.whitelist = whitelist;
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static void insert_report_filterlist(const char *func)
> +{
> +       unsigned long flags;
> +       unsigned long addr = kallsyms_lookup_name(func);
> +
> +       if (!addr) {
> +               pr_err("KCSAN: could not find function: '%s'\n", func);
> +               return;
> +       }
> +
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +
> +       if (report_filterlist.addrs == NULL)
> +               report_filterlist.addrs = /* initial allocation */
> +                       kvmalloc_array(report_filterlist.size,
> +                                      sizeof(unsigned long), GFP_KERNEL);
> +       else if (report_filterlist.used == report_filterlist.size) {
> +               /* resize filterlist */
> +               unsigned long *new_addrs;
> +
> +               report_filterlist.size *= 2;
> +               new_addrs = kvmalloc_array(report_filterlist.size,
> +                                          sizeof(unsigned long), GFP_KERNEL);
> +               memcpy(new_addrs, report_filterlist.addrs,
> +                      report_filterlist.used * sizeof(unsigned long));
> +               kvfree(report_filterlist.addrs);
> +               report_filterlist.addrs = new_addrs;
> +       }
> +
> +       /* Note: deduplicating should be done in userspace. */
> +       report_filterlist.addrs[report_filterlist.used++] =
> +               kallsyms_lookup_name(func);
> +       report_filterlist.sorted = false;
> +
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +}
> +
> +static int show_info(struct seq_file *file, void *v)
> +{
> +       int i;
> +       unsigned long flags;
> +
> +       /* show stats */
> +       seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
> +       for (i = 0; i < kcsan_counter_count; ++i)
> +               seq_printf(file, "%s: %ld\n", counter_to_name(i),
> +                          atomic_long_read(&counters[i]));
> +
> +       /* show filter functions, and filter type */
> +       spin_lock_irqsave(&report_filterlist_lock, flags);
> +       seq_printf(file, "\n%s functions: %s\n",
> +                  report_filterlist.whitelist ? "whitelisted" : "blacklisted",
> +                  report_filterlist.used == 0 ? "none" : "");
> +       for (i = 0; i < report_filterlist.used; ++i)
> +               seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
> +       spin_unlock_irqrestore(&report_filterlist_lock, flags);
> +
> +       return 0;
> +}
> +
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, show_info, NULL);
> +}
> +
> +static ssize_t debugfs_write(struct file *file, const char __user *buf,
> +                            size_t count, loff_t *off)
> +{
> +       char kbuf[KSYM_NAME_LEN];
> +       char *arg;
> +       int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
> +
> +       if (copy_from_user(kbuf, buf, read_len))
> +               return -EINVAL;
> +       kbuf[read_len] = '\0';
> +       arg = strstrip(kbuf);
> +
> +       if (!strncmp(arg, "on", sizeof("on") - 1))
> +               WRITE_ONCE(kcsan_enabled, true);
> +       else if (!strncmp(arg, "off", sizeof("off") - 1))
> +               WRITE_ONCE(kcsan_enabled, false);
> +       else if (!strncmp(arg, "whitelist", sizeof("whitelist") - 1))
> +               set_report_filterlist_whitelist(true);
> +       else if (!strncmp(arg, "blacklist", sizeof("blacklist") - 1))
> +               set_report_filterlist_whitelist(false);
> +       else if (arg[0] == '!')
> +               insert_report_filterlist(&arg[1]);
> +       else
> +               return -EINVAL;
> +
> +       return count;
> +}
> +
> +static const struct file_operations debugfs_ops = { .read = seq_read,
> +                                                   .open = debugfs_open,
> +                                                   .write = debugfs_write,
> +                                                   .release = single_release };
> +
> +void __init kcsan_debugfs_init(void)
> +{
> +       debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
> +}
> diff --git a/kernel/kcsan/encoding.h b/kernel/kcsan/encoding.h
> new file mode 100644
> index 000000000000..8f9b1ce0e59f
> --- /dev/null
> +++ b/kernel/kcsan/encoding.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_ENCODING_H
> +#define _MM_KCSAN_ENCODING_H
> +
> +#include <linux/bits.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include "kcsan.h"
> +
> +#define SLOT_RANGE PAGE_SIZE
> +#define INVALID_WATCHPOINT 0
> +#define CONSUMED_WATCHPOINT 1
> +
> +/*
> + * The maximum useful size of accesses for which we set up watchpoints is the
> + * max range of slots we check on an access.
> + */
> +#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
> +
> +/*
> + * Number of bits we use to store size info.
> + */
> +#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
> +/*
> + * This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
> + * however, most 64-bit architectures do not use the full 64-bit address space.
> + * Also, in order for a false positive to be observable 2 things need to happen:
> + *
> + *     1. different addresses but with the same encoded address race;
> + *     2. and both map onto the same watchpoint slots;
> + *
> + * Both these are assumed to be very unlikely. However, in case it still happens
> + * happens, the report logic will filter out the false positive (see report.c).
> + */
> +#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
> +
> +/*
> + * Masks to set/retrieve the encoded data.
> + */
> +#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
> +#define WATCHPOINT_SIZE_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
> +#define WATCHPOINT_ADDR_MASK                                                   \
> +       GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
> +
> +static inline bool check_encodable(unsigned long addr, size_t size)
> +{
> +       return size <= MAX_ENCODABLE_SIZE;
> +}
> +
> +static inline long encode_watchpoint(unsigned long addr, size_t size,
> +                                    bool is_write)
> +{
> +       return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
> +                     (size << WATCHPOINT_ADDR_BITS) |
> +                     (addr & WATCHPOINT_ADDR_MASK));
> +}
> +
> +static inline bool decode_watchpoint(long watchpoint,
> +                                    unsigned long *addr_masked, size_t *size,
> +                                    bool *is_write)
> +{
> +       if (watchpoint == INVALID_WATCHPOINT ||
> +           watchpoint == CONSUMED_WATCHPOINT)
> +               return false;
> +
> +       *addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
> +       *size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
> +               WATCHPOINT_ADDR_BITS;
> +       *is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
> +
> +       return true;
> +}
> +
> +/*
> + * Return watchpoint slot for an address.
> + */
> +static inline int watchpoint_slot(unsigned long addr)
> +{
> +       return (addr / PAGE_SIZE) % KCSAN_NUM_WATCHPOINTS;
> +}
> +
> +static inline bool matching_access(unsigned long addr1, size_t size1,
> +                                  unsigned long addr2, size_t size2)
> +{
> +       unsigned long end_range1 = addr1 + size1 - 1;
> +       unsigned long end_range2 = addr2 + size2 - 1;
> +
> +       return addr1 <= end_range2 && addr2 <= end_range1;
> +}
> +
> +#endif /* _MM_KCSAN_ENCODING_H */
> diff --git a/kernel/kcsan/kcsan.c b/kernel/kcsan/kcsan.c
> new file mode 100644
> index 000000000000..45cf2fffd8a0
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
> + * see Documentation/dev-tools/kcsan.rst.
> + */
> +
> +#include <linux/export.h>
> +
> +#include "kcsan.h"
> +
> +/*
> + * KCSAN uses the same instrumentation that is emitted by supported compilers
> + * for Thread Sanitizer (TSAN).
> + *
> + * When enabled, the compiler emits instrumentation calls (the functions
> + * prefixed with "__tsan" below) for all loads and stores that it generated;
> + * inline asm is not instrumented.
> + */
> +
> +#define DEFINE_TSAN_READ_WRITE(size)                                           \
> +       void __tsan_read##size(void *ptr)                                      \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_read##size);                                      \
> +       void __tsan_write##size(void *ptr)                                     \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_write##size)
> +
> +DEFINE_TSAN_READ_WRITE(1);
> +DEFINE_TSAN_READ_WRITE(2);
> +DEFINE_TSAN_READ_WRITE(4);
> +DEFINE_TSAN_READ_WRITE(8);
> +DEFINE_TSAN_READ_WRITE(16);
> +
> +/*
> + * Not all supported compiler versions distinguish aligned/unaligned accesses,
> + * but e.g. recent versions of Clang do.
> + */
> +#define DEFINE_TSAN_UNALIGNED_READ_WRITE(size)                                 \
> +       void __tsan_unaligned_read##size(void *ptr)                            \
> +       {                                                                      \
> +               __kcsan_check_read(ptr, size);                                 \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_read##size);                            \
> +       void __tsan_unaligned_write##size(void *ptr)                           \
> +       {                                                                      \
> +               __kcsan_check_write(ptr, size);                                \
> +       }                                                                      \
> +       EXPORT_SYMBOL(__tsan_unaligned_write##size)
> +
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(2);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(4);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(8);
> +DEFINE_TSAN_UNALIGNED_READ_WRITE(16);
> +
> +void __tsan_read_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_read(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_read_range);
> +
> +void __tsan_write_range(void *ptr, size_t size)
> +{
> +       __kcsan_check_write(ptr, size);
> +}
> +EXPORT_SYMBOL(__tsan_write_range);
> +
> +/*
> + * The below are not required KCSAN, but can still be emitted by the compiler.
> + */
> +void __tsan_func_entry(void *call_pc)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_entry);
> +void __tsan_func_exit(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_func_exit);
> +void __tsan_init(void)
> +{
> +}
> +EXPORT_SYMBOL(__tsan_init);
> diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
> new file mode 100644
> index 000000000000..429479b3041d
> --- /dev/null
> +++ b/kernel/kcsan/kcsan.h
> @@ -0,0 +1,140 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _MM_KCSAN_KCSAN_H
> +#define _MM_KCSAN_KCSAN_H
> +
> +#include <linux/kcsan.h>
> +
> +/*
> + * Total number of watchpoints. An address range maps into a specific slot as
> + * specified in `encoding.h`. Although larger number of watchpoints may not even
> + * be usable due to limited thread count, a larger value will improve
> + * performance due to reducing cache-line contention.
> + */
> +#define KCSAN_NUM_WATCHPOINTS 64
> +
> +/*
> + * The number of adjacent watchpoints to check; the purpose is 2-fold:
> + *
> + *     1. the address slot is already occupied, check if any adjacent slots are
> + *        free;
> + *     2. accesses that straddle a slot boundary due to size that exceeds a
> + *        slot's range may check adjacent slots if any watchpoint matches.
> + *
> + * Note that accesses with very large size may still miss a watchpoint; however,
> + * given this should be rare, this is a reasonable trade-off to make, since this
> + * will avoid:
> + *
> + *     1. excessive contention between watchpoint checks and setup;
> + *     2. larger number of simultaneous watchpoints without sacrificing
> + *        performance.
> + */
> +#define KCSAN_CHECK_ADJACENT 1
> +
> +/*
> + * Globally enable and disable KCSAN.
> + */
> +extern bool kcsan_enabled;
> +
> +/*
> + * Helper that returns true if access to ptr should be considered as an atomic
> + * access, even though it is not explicitly atomic.
> + */
> +bool kcsan_is_atomic(const volatile void *ptr);
> +
> +/*
> + * Initialize debugfs file.
> + */
> +void kcsan_debugfs_init(void);
> +
> +enum kcsan_counter_id {
> +       /*
> +        * Number of watchpoints currently in use.
> +        */
> +       kcsan_counter_used_watchpoints,
> +
> +       /*
> +        * Total number of watchpoints set up.
> +        */
> +       kcsan_counter_setup_watchpoints,
> +
> +       /*
> +        * Total number of data-races.
> +        */
> +       kcsan_counter_data_races,
> +
> +       /*
> +        * Number of times no watchpoints were available.
> +        */
> +       kcsan_counter_no_capacity,
> +
> +       /*
> +        * A thread checking a watchpoint raced with another checking thread;
> +        * only one will be reported.
> +        */
> +       kcsan_counter_report_races,
> +
> +       /*
> +        * Observed data value change, but writer thread unknown.
> +        */
> +       kcsan_counter_races_unknown_origin,
> +
> +       /*
> +        * The access cannot be encoded to a valid watchpoint.
> +        */
> +       kcsan_counter_unencodable_accesses,
> +
> +       /*
> +        * Watchpoint encoding caused a watchpoint to fire on mismatching
> +        * accesses.
> +        */
> +       kcsan_counter_encoding_false_positives,
> +
> +       kcsan_counter_count, /* number of counters */
> +};
> +
> +/*
> + * Increment/decrement counter with given id; avoid calling these in fast-path.
> + */
> +void kcsan_counter_inc(enum kcsan_counter_id id);
> +void kcsan_counter_dec(enum kcsan_counter_id id);
> +
> +/*
> + * Returns true if data-races in the function symbol that maps to addr (offsets
> + * are ignored) should *not* be reported.
> + */
> +bool kcsan_skip_report(unsigned long func_addr);
> +
> +enum kcsan_report_type {
> +       /*
> +        * The thread that set up the watchpoint and briefly stalled was
> +        * signalled that another thread triggered the watchpoint, and thus a
> +        * race was encountered.
> +        */
> +       kcsan_report_race_setup,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, therefore a race
> +        * was encountered.
> +        */
> +       kcsan_report_race_check,
> +
> +       /*
> +        * A thread encountered a watchpoint for the access, but the other
> +        * racing thread can no longer be signaled that a race occurred.
> +        */
> +       kcsan_report_race_check_race,
> +
> +       /*
> +        * No other thread was observed to race with the access, but the data
> +        * value before and after the stall differs.
> +        */
> +       kcsan_report_race_unknown_origin,
> +};
> +/*
> + * Print a race report from thread that encountered the race.
> + */
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type);
> +
> +#endif /* _MM_KCSAN_KCSAN_H */
> diff --git a/kernel/kcsan/report.c b/kernel/kcsan/report.c
> new file mode 100644
> index 000000000000..517db539e4e7
> --- /dev/null
> +++ b/kernel/kcsan/report.c
> @@ -0,0 +1,306 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/kernel.h>
> +#include <linux/preempt.h>
> +#include <linux/printk.h>
> +#include <linux/sched.h>
> +#include <linux/spinlock.h>
> +#include <linux/stacktrace.h>
> +
> +#include "kcsan.h"
> +#include "encoding.h"
> +
> +/*
> + * Max. number of stack entries to show in the report.
> + */
> +#define NUM_STACK_ENTRIES 16
> +
> +/*
> + * Other thread info: communicated from other racing thread to thread that set
> + * up the watchpoint, which then prints the complete report atomically. Only
> + * need one struct, as all threads should to be serialized regardless to print
> + * the reports, with reporting being in the slow-path.
> + */
> +static struct {
> +       const volatile void *ptr;
> +       size_t size;
> +       bool is_write;
> +       int task_pid;
> +       int cpu_id;
> +       unsigned long stack_entries[NUM_STACK_ENTRIES];
> +       int num_stack_entries;
> +} other_info = { .ptr = NULL };
> +
> +static DEFINE_SPINLOCK(other_info_lock);
> +static DEFINE_SPINLOCK(report_lock);
> +
> +static bool set_or_lock_other_info(unsigned long *flags,
> +                                  const volatile void *ptr, size_t size,
> +                                  bool is_write, int cpu_id,
> +                                  enum kcsan_report_type type)
> +{
> +       if (type != kcsan_report_race_check && type != kcsan_report_race_setup)
> +               return true;
> +
> +       for (;;) {
> +               spin_lock_irqsave(&other_info_lock, *flags);
> +
> +               switch (type) {
> +               case kcsan_report_race_check:
> +                       if (other_info.ptr != NULL) {
> +                               /* still in use, retry */
> +                               break;
> +                       }
> +                       other_info.ptr = ptr;
> +                       other_info.size = size;
> +                       other_info.is_write = is_write;
> +                       other_info.task_pid =
> +                               in_task() ? task_pid_nr(current) : -1;
> +                       other_info.cpu_id = cpu_id;
> +                       other_info.num_stack_entries = stack_trace_save(
> +                               other_info.stack_entries, NUM_STACK_ENTRIES, 1);
> +                       /*
> +                        * other_info may now be consumed by thread we raced
> +                        * with.
> +                        */
> +                       spin_unlock_irqrestore(&other_info_lock, *flags);
> +                       return false;
> +
> +               case kcsan_report_race_setup:
> +                       if (other_info.ptr == NULL)
> +                               break; /* no data available yet, retry */
> +
> +                       /*
> +                        * First check if matching based on how watchpoint was
> +                        * encoded.
> +                        */
> +                       if (!matching_access((unsigned long)other_info.ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            other_info.size,
> +                                            (unsigned long)ptr &
> +                                                    WATCHPOINT_ADDR_MASK,
> +                                            size))
> +                               break; /* mismatching access, retry */
> +
> +                       if (!matching_access((unsigned long)other_info.ptr,
> +                                            other_info.size,
> +                                            (unsigned long)ptr, size)) {
> +                               /*
> +                                * If the actual accesses to not match, this was
> +                                * a false positive due to watchpoint encoding.
> +                                */
> +                               other_info.ptr = NULL; /* mark for reuse */
> +                               kcsan_counter_inc(
> +                                       kcsan_counter_encoding_false_positives);
> +                               spin_unlock_irqrestore(&other_info_lock,
> +                                                      *flags);
> +                               return false;
> +                       }
> +
> +                       /*
> +                        * Matching access: keep other_info locked, as this
> +                        * thread uses it to print the full report; unlocked in
> +                        * end_report.
> +                        */
> +                       return true;
> +
> +               default:
> +                       BUG();
> +               }
> +
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +       }
> +}
> +
> +static void start_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               /* irqsaved already via other_info_lock */
> +               spin_lock(&report_lock);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_lock_irqsave(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static void end_report(unsigned long *flags, enum kcsan_report_type type)
> +{
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               other_info.ptr = NULL; /* mark for reuse */
> +               spin_unlock(&report_lock);
> +               spin_unlock_irqrestore(&other_info_lock, *flags);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               spin_unlock_irqrestore(&report_lock, *flags);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +}
> +
> +static const char *get_access_type(bool is_write)
> +{
> +       return is_write ? "write" : "read";
> +}
> +
> +/* Return thread description: in task or interrupt. */
> +static const char *get_thread_desc(int task_id)
> +{
> +       if (task_id != -1) {
> +               static char buf[32]; /* safe: protected by report_lock */
> +
> +               snprintf(buf, sizeof(buf), "task %i", task_id);
> +               return buf;
> +       }
> +       return in_nmi() ? "NMI" : "interrupt";
> +}
> +
> +/* Helper to skip KCSAN-related functions in stack-trace. */
> +static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
> +{
> +       char buf[64];
> +       int skip = 0;
> +
> +       for (; skip < num_entries; ++skip) {
> +               snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
> +               if (!strnstr(buf, "csan_", sizeof(buf)) &&
> +                   !strnstr(buf, "tsan_", sizeof(buf)) &&
> +                   !strnstr(buf, "_once_size", sizeof(buf))) {
> +                       break;
> +               }
> +       }
> +       return skip;
> +}
> +
> +/* Compares symbolized strings of addr1 and addr2. */
> +static int sym_strcmp(void *addr1, void *addr2)
> +{
> +       char buf1[64];
> +       char buf2[64];
> +
> +       snprintf(buf1, sizeof(buf1), "%pS", addr1);
> +       snprintf(buf2, sizeof(buf2), "%pS", addr2);
> +       return strncmp(buf1, buf2, sizeof(buf1));
> +}
> +
> +/*
> + * Returns true if a report was generated, false otherwise.
> + */
> +static bool print_summary(const volatile void *ptr, size_t size, bool is_write,
> +                         int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
> +       int num_stack_entries =
> +               stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
> +       int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +       int other_skipnr;
> +
> +       /* Check if the top stackframe is in a blacklisted function. */
> +       if (kcsan_skip_report(stack_entries[skipnr]))
> +               return false;
> +       if (type == kcsan_report_race_setup) {
> +               other_skipnr = get_stack_skipnr(other_info.stack_entries,
> +                                               other_info.num_stack_entries);
> +               if (kcsan_skip_report(other_info.stack_entries[other_skipnr]))
> +                       return false;
> +       }
> +
> +       /* Print report header. */
> +       pr_err("==================================================================\n");
> +       switch (type) {
> +       case kcsan_report_race_setup: {
> +               void *this_fn = (void *)stack_entries[skipnr];
> +               void *other_fn = (void *)other_info.stack_entries[other_skipnr];
> +               int cmp;
> +
> +               /*
> +                * Order functions lexographically for consistent bug titles.
> +                * Do not print offset of functions to keep title short.
> +                */
> +               cmp = sym_strcmp(other_fn, this_fn);
> +               pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
> +                      cmp < 0 ? other_fn : this_fn,
> +                      cmp < 0 ? this_fn : other_fn);
> +       } break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("BUG: KCSAN: data-race in %pS\n",
> +                      (void *)stack_entries[skipnr]);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +
> +       pr_err("\n");
> +
> +       /* Print information about the racing accesses. */
> +       switch (type) {
> +       case kcsan_report_race_setup:
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(other_info.is_write), other_info.ptr,
> +                      other_info.size, get_thread_desc(other_info.task_pid),
> +                      other_info.cpu_id);
> +
> +               /* Print the other thread's stack trace. */
> +               stack_trace_print(other_info.stack_entries + other_skipnr,
> +                                 other_info.num_stack_entries - other_skipnr,
> +                                 0);
> +
> +               pr_err("\n");
> +               pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       case kcsan_report_race_unknown_origin:
> +               pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
> +                      get_access_type(is_write), ptr, size,
> +                      get_thread_desc(in_task() ? task_pid_nr(current) : -1),
> +                      cpu_id);
> +               break;
> +
> +       default:
> +               BUG();
> +       }
> +       /* Print stack trace of this thread. */
> +       stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> +                         0);
> +
> +       /* Print report footer. */
> +       pr_err("\n");
> +       pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
> +       dump_stack_print_info(KERN_DEFAULT);
> +       pr_err("==================================================================\n");
> +
> +       return true;
> +}
> +
> +void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
> +                 int cpu_id, enum kcsan_report_type type)
> +{
> +       unsigned long flags = 0;
> +
> +       if (type == kcsan_report_race_check_race)
> +               return;
> +
> +       kcsan_disable_current();
> +       if (set_or_lock_other_info(&flags, ptr, size, is_write, cpu_id, type)) {
> +               start_report(&flags, type);
> +               if (print_summary(ptr, size, is_write, cpu_id, type) &&
> +                   panic_on_warn)
> +                       panic("panic_on_warn set ...\n");
> +               end_report(&flags, type);
> +       }
> +       kcsan_enable_current();
> +}
> diff --git a/kernel/kcsan/test.c b/kernel/kcsan/test.c
> new file mode 100644
> index 000000000000..68c896a24529
> --- /dev/null
> +++ b/kernel/kcsan/test.c
> @@ -0,0 +1,117 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/printk.h>
> +#include <linux/random.h>
> +#include <linux/types.h>
> +
> +#include "encoding.h"
> +
> +#define ITERS_PER_TEST 2000
> +
> +/* Test requirements. */
> +static bool test_requires(void)
> +{
> +       /* random should be initialized */
> +       return prandom_u32() + prandom_u32() != 0;
> +}
> +
> +/* Test watchpoint encode and decode. */
> +static bool test_encode_decode(void)
> +{
> +       int i;
> +
> +       for (i = 0; i < ITERS_PER_TEST; ++i) {
> +               size_t size = prandom_u32() % MAX_ENCODABLE_SIZE + 1;
> +               bool is_write = prandom_u32() % 2;
> +               unsigned long addr;
> +
> +               prandom_bytes(&addr, sizeof(addr));
> +               if (WARN_ON(!check_encodable(addr, size)))
> +                       return false;
> +
> +               /* encode and decode */
> +               {
> +                       const long encoded_watchpoint =
> +                               encode_watchpoint(addr, size, is_write);
> +                       unsigned long verif_masked_addr;
> +                       size_t verif_size;
> +                       bool verif_is_write;
> +
> +                       /* check special watchpoints */
> +                       if (WARN_ON(decode_watchpoint(
> +                                   INVALID_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(decode_watchpoint(
> +                                   CONSUMED_WATCHPOINT, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +
> +                       /* check decoding watchpoint returns same data */
> +                       if (WARN_ON(!decode_watchpoint(
> +                                   encoded_watchpoint, &verif_masked_addr,
> +                                   &verif_size, &verif_is_write)))
> +                               return false;
> +                       if (WARN_ON(verif_masked_addr !=
> +                                   (addr & WATCHPOINT_ADDR_MASK)))
> +                               goto fail;
> +                       if (WARN_ON(verif_size != size))
> +                               goto fail;
> +                       if (WARN_ON(is_write != verif_is_write))
> +                               goto fail;
> +
> +                       continue;
> +fail:
> +                       pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
> +                              __func__, is_write ? "write" : "read", size,
> +                              addr, encoded_watchpoint,
> +                              verif_is_write ? "write" : "read", verif_size,
> +                              verif_masked_addr);
> +                       return false;
> +               }
> +       }
> +
> +       return true;
> +}
> +
> +static bool test_matching_access(void)
> +{
> +       if (WARN_ON(!matching_access(10, 1, 10, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 2, 11, 1)))
> +               return false;
> +       if (WARN_ON(!matching_access(10, 1, 9, 2)))
> +               return false;
> +       if (WARN_ON(matching_access(10, 1, 11, 1)))
> +               return false;
> +       if (WARN_ON(matching_access(9, 1, 10, 1)))
> +               return false;
> +       return true;
> +}
> +
> +static int __init kcsan_selftest(void)
> +{
> +       int passed = 0;
> +       int total = 0;
> +
> +#define RUN_TEST(do_test)                                                      \
> +       do {                                                                   \
> +               ++total;                                                       \
> +               if (do_test())                                                 \
> +                       ++passed;                                              \
> +               else                                                           \
> +                       pr_err("KCSAN selftest: " #do_test " failed");         \
> +       } while (0)
> +
> +       RUN_TEST(test_requires);
> +       RUN_TEST(test_encode_decode);
> +       RUN_TEST(test_matching_access);
> +
> +       pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
> +       if (passed != total)
> +               panic("KCSAN selftests failed");
> +       return 0;
> +}
> +postcore_initcall(kcsan_selftest);
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 93d97f9b0157..35accd1d93de 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
>
>  source "lib/Kconfig.ubsan"
>
> +source "lib/Kconfig.kcsan"
> +
>  config ARCH_HAS_DEVMEM_IS_ALLOWED
>         bool
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> new file mode 100644
> index 000000000000..3e1f1acfb24b
> --- /dev/null
> +++ b/lib/Kconfig.kcsan
> @@ -0,0 +1,88 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config HAVE_ARCH_KCSAN
> +       bool
> +
> +menuconfig KCSAN
> +       bool "KCSAN: watchpoint-based dynamic data-race detector"
> +       depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
> +       default n
> +       help
> +         Kernel Concurrency Sanitizer is a dynamic data-race detector, which
> +         uses a watchpoint-based sampling approach to detect races.
> +
> +if KCSAN
> +
> +config KCSAN_SELFTEST
> +       bool "KCSAN: perform short selftests on boot"
> +       default y
> +       help
> +         Run KCSAN selftests on boot. On test failure, causes kernel to panic.
> +
> +config KCSAN_EARLY_ENABLE
> +       bool "KCSAN: early enable"
> +       default y
> +       help
> +         If KCSAN should be enabled globally as soon as possible. KCSAN can
> +         later be enabled/disabled via debugfs.
> +
> +config KCSAN_UDELAY_MAX_TASK
> +       int "KCSAN: maximum delay in microseconds (for tasks)"
> +       default 80
> +       help
> +         For tasks, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_UDELAY_MAX_INTERRUPT
> +       int "KCSAN: maximum delay in microseconds (for interrupts)"
> +       default 20
> +       help
> +         For interrupts, the max. microsecond delay after setting up a watchpoint.
> +
> +config KCSAN_DELAY_RANDOMIZE
> +       bool "KCSAN: randomize delays"
> +       default y
> +       help
> +         If delays should be randomized; if false, the chosen delay is simply
> +         the maximum values defined above.
> +
> +config KCSAN_WATCH_SKIP_INST
> +       int "KCSAN: watchpoint instruction skip"
> +       default 2000
> +       help
> +         The number of per-CPU memory operations to skip watching, before
> +         another watchpoint is set up; in other words, 1 in
> +         KCSAN_WATCH_SKIP_INST per-CPU memory operations are used to set up a
> +         watchpoint. A smaller value results in more aggressive race
> +         detection, whereas a larger value improves system performance at the
> +         cost of missing some races.
> +
> +config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
> +       bool "KCSAN: report races of unknown origin"
> +       default y
> +       help
> +         If KCSAN should report races where only one access is known, and the
> +         conflicting access is of unknown origin. This type of race is
> +         reported if it was only possible to infer a race due to a data-value
> +         change while an access is being delayed on a watchpoint.
> +
> +config KCSAN_IGNORE_ATOMICS
> +       bool "KCSAN: do not instrument marked atomic accesses"
> +       default n
> +       help
> +         If enabled, never instruments marked atomic accesses. This results in
> +         not reporting data-races where one access is atomic and the other is
> +         a plain access.
> +
> +config KCSAN_PLAIN_WRITE_PRETEND_ONCE
> +       bool "KCSAN: pretend plain writes are WRITE_ONCE"
> +       default n
> +       help
> +         This option makes KCSAN pretend that all plain writes are WRITE_ONCE.
> +         This option should only be used to prune initial data-races found in
> +         existing code.
> +
> +config KCSAN_DEBUG
> +       bool "Debugging of KCSAN internals"
> +       default n
> +
> +endif # KCSAN
> diff --git a/lib/Makefile b/lib/Makefile
> index c5892807e06f..778ab704e3ad 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
>  CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
>  endif
>
> +# Used by KCSAN while enabled, avoid recursion.
> +KCSAN_SANITIZE_random32.o := n
> +
>  lib-y := ctype.o string.o vsprintf.o cmdline.o \
>          rbtree.o radix-tree.o timerqueue.o xarray.o \
>          idr.o extable.o \
> diff --git a/scripts/Makefile.kcsan b/scripts/Makefile.kcsan
> new file mode 100644
> index 000000000000..caf1111a28ae
> --- /dev/null
> +++ b/scripts/Makefile.kcsan
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0
> +ifdef CONFIG_KCSAN
> +
> +CFLAGS_KCSAN := -fsanitize=thread
> +
> +endif # CONFIG_KCSAN
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 179d55af5852..0e78abab7d83 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
>         $(CFLAGS_KCOV))
>  endif
>
> +#
> +# Enable ConcurrencySanitizer flags for kernel except some files or directories
> +# we don't want to check (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
> +#
> +ifeq ($(CONFIG_KCSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> +       $(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
> +       $(CFLAGS_KCSAN))
> +endif
> +
>  # $(srctree)/$(src) for including checkin headers from generated source files
>  # $(objtree)/$(obj) for including generated headers from checkin source files
>  ifeq ($(KBUILD_EXTMOD),)
> --
> 2.23.0.866.gb869b98d4c-goog
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-17 14:12   ` Marco Elver
  (?)
@ 2019-10-23 12:32     ` Dmitry Vyukov
  -1 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 12:32 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.

I think there is some significant potential for improving performance.
Currently we have __tsan_read8 do 2 function calls, push/pop, the
second call is on unpredicted slow path.
Then __kcsan_check_watchpoint and __kcsan_setup_watchpoint do full
load of spills and lots of loads and checks that are not strictly
necessary or can be avoided. Additionally __kcsan_setup_watchpoint
calls non-inlined kcsan_is_atomic.
I think we need to try to structure it around the fast path as follows:
__tsan_read8 does no function calls and no spills on fast path for
both checking existing watchpoints and checking if a new watchpoint
need to be setup. If it discovers a race with existing watchpoint or
needs to setup a new one, that should be non-inlined tail calls to the
corresponding slow paths.
In particular, global enable/disable can be replaced with
occupying/freeing all watchpoints.
Per cpu disabled check should be removed from fast path somehow, it's
only used around debugging checks or during reporting. There should be
a way to check it on a slower path.
user_access_save should be removed from fast path, we needed it only
if we setup a watchpoint. But I am not sure why we need it at all, we
should not be reading any user addresses.
should_watch should be restructured to decrement kcsan_skip first, if
it hits zero (with unlikely hint), we go to slow path. The slow path
resets kcsan_skip to something random. The comment mentions
prandom_u32 is too expensive, do I understand it correctly that you
tried to call it on the fast path? I would expect it is fine for slow
path and will give us better randomness.
At this point we should return from __tsan_read8.

To measure performance we could either do some synthetic in-kernel
benchmarks (e.g. writing something to the debugfs file, which will do
a number of memory accesses in a loop). Or you may try these
user-space benchmarks:
https://github.com/google/sanitizers/blob/master/address-sanitizer/kernel_buildbot/slave/bench_readv.c
https://github.com/google/sanitizers/blob/master/address-sanitizer/kernel_buildbot/slave/bench_pipes.c

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23 12:32     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 12:32 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf

On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.

I think there is some significant potential for improving performance.
Currently we have __tsan_read8 do 2 function calls, push/pop, the
second call is on unpredicted slow path.
Then __kcsan_check_watchpoint and __kcsan_setup_watchpoint do full
load of spills and lots of loads and checks that are not strictly
necessary or can be avoided. Additionally __kcsan_setup_watchpoint
calls non-inlined kcsan_is_atomic.
I think we need to try to structure it around the fast path as follows:
__tsan_read8 does no function calls and no spills on fast path for
both checking existing watchpoints and checking if a new watchpoint
need to be setup. If it discovers a race with existing watchpoint or
needs to setup a new one, that should be non-inlined tail calls to the
corresponding slow paths.
In particular, global enable/disable can be replaced with
occupying/freeing all watchpoints.
Per cpu disabled check should be removed from fast path somehow, it's
only used around debugging checks or during reporting. There should be
a way to check it on a slower path.
user_access_save should be removed from fast path, we needed it only
if we setup a watchpoint. But I am not sure why we need it at all, we
should not be reading any user addresses.
should_watch should be restructured to decrement kcsan_skip first, if
it hits zero (with unlikely hint), we go to slow path. The slow path
resets kcsan_skip to something random. The comment mentions
prandom_u32 is too expensive, do I understand it correctly that you
tried to call it on the fast path? I would expect it is fine for slow
path and will give us better randomness.
At this point we should return from __tsan_read8.

To measure performance we could either do some synthetic in-kernel
benchmarks (e.g. writing something to the debugfs file, which will do
a number of memory accesses in a loop). Or you may try these
user-space benchmarks:
https://github.com/google/sanitizers/blob/master/address-sanitizer/kernel_buildbot/slave/bench_readv.c
https://github.com/google/sanitizers/blob/master/address-sanitizer/kernel_buildbot/slave/bench_pipes.c

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23 12:32     ` Dmitry Vyukov
  0 siblings, 0 replies; 88+ messages in thread
From: Dmitry Vyukov @ 2019-10-23 12:32 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, H. Peter Anvin, Ingo Molnar, Jade Alglave,
	Joel Fernandes, Jonathan Corbet, Josh Poimboeuf, Luc Maranget,
	Mark Rutland, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi,
	open list:KERNEL BUILD + fi...,
	LKML, Linux-MM, the arch/x86 maintainers

On Thu, Oct 17, 2019 at 4:13 PM Marco Elver <elver@google.com> wrote:
>
> Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
> kernel space. KCSAN is a sampling watchpoint-based data-race detector.
> See the included Documentation/dev-tools/kcsan.rst for more details.

I think there is some significant potential for improving performance.
Currently we have __tsan_read8 do 2 function calls, push/pop, the
second call is on unpredicted slow path.
Then __kcsan_check_watchpoint and __kcsan_setup_watchpoint do full
load of spills and lots of loads and checks that are not strictly
necessary or can be avoided. Additionally __kcsan_setup_watchpoint
calls non-inlined kcsan_is_atomic.
I think we need to try to structure it around the fast path as follows:
__tsan_read8 does no function calls and no spills on fast path for
both checking existing watchpoints and checking if a new watchpoint
need to be setup. If it discovers a race with existing watchpoint or
needs to setup a new one, that should be non-inlined tail calls to the
corresponding slow paths.
In particular, global enable/disable can be replaced with
occupying/freeing all watchpoints.
Per cpu disabled check should be removed from fast path somehow, it's
only used around debugging checks or during reporting. There should be
a way to check it on a slower path.
user_access_save should be removed from fast path, we needed it only
if we setup a watchpoint. But I am not sure why we need it at all, we
should not be reading any user addresses.
should_watch should be restructured to decrement kcsan_skip first, if
it hits zero (with unlikely hint), we go to slow path. The slow path
resets kcsan_skip to something random. The comment mentions
prandom_u32 is too expensive, do I understand it correctly that you
tried to call it on the fast path? I would expect it is fine for slow
path and will give us better randomness.
At this point we should return from __tsan_read8.

To measure performance we could either do some synthetic in-kernel
benchmarks (e.g. writing something to the debugfs file, which will do
a number of memory accesses in a loop). Or you may try these
user-space benchmarks:
https://github.com/google/sanitizers/blob/master/address-sanitizer/kernel_buildbot/slave/bench_readv.c
https://github.com/google/sanitizers/blob/master/address-sanitizer/kernel_buildbot/slave/bench_pipes.c


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-22 17:42       ` Marco Elver
@ 2019-10-23 16:24         ` Oleg Nesterov
  -1 siblings, 0 replies; 88+ messages in thread
From: Oleg Nesterov @ 2019-10-23 16:24 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Mark Rutland, Nicholas Piggin, Paul E. McKenney,
	Peter Zijlstra, Thomas Gleixner, Will Deacon, kasan-dev,
	linux-arch, open list:DOCUMENTATION, linux-efi,
	Linux Kbuild mailing list, LKML, Linux Memory Management List,
	the arch/x86 maintainers

On 10/22, Marco Elver wrote:
>
> On Tue, 22 Oct 2019 at 17:49, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Just for example. Suppose that task->state = TASK_UNINTERRUPTIBLE, this task
> > does __set_current_state(TASK_RUNNING), another CPU does wake_up_process(task)
> > which does the same UNINTERRUPTIBLE -> RUNNING transition.
> >
> > Looks like, this is the "data race" according to kcsan?
>
> Yes, they are "data races". They are probably not "race conditions" though.
>
> This is a fair distinction to make, and we never claimed to find "race
> conditions" only

I see, thanks, just wanted to be sure...

> KCSAN's goal is to find *data races* according to the LKMM.  Some data
> races are race conditions (usually the more interesting bugs) -- but
> not *all* data races are race conditions. Those are what are usually
> referred to as "benign", but they can still become bugs on the wrong
> arch/compiler combination. Hence, the need to annotate these accesses
> with READ_ONCE, WRITE_ONCE or use atomic_t:

Well, if I see READ_ONCE() in the code I want to understand why it was
used. Is it really needed for correctness or we want to shut up kcsan?
Say, why should wait_event(wq, *ptr) use READ_ONCE()? Nevermind, please
forget.

Btw, why __kcsan_check_watchpoint() does user_access_save() before
try_consume_watchpoint() ?

Oleg.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-23 16:24         ` Oleg Nesterov
  0 siblings, 0 replies; 88+ messages in thread
From: Oleg Nesterov @ 2019-10-23 16:24 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet

On 10/22, Marco Elver wrote:
>
> On Tue, 22 Oct 2019 at 17:49, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Just for example. Suppose that task->state = TASK_UNINTERRUPTIBLE, this task
> > does __set_current_state(TASK_RUNNING), another CPU does wake_up_process(task)
> > which does the same UNINTERRUPTIBLE -> RUNNING transition.
> >
> > Looks like, this is the "data race" according to kcsan?
>
> Yes, they are "data races". They are probably not "race conditions" though.
>
> This is a fair distinction to make, and we never claimed to find "race
> conditions" only

I see, thanks, just wanted to be sure...

> KCSAN's goal is to find *data races* according to the LKMM.  Some data
> races are race conditions (usually the more interesting bugs) -- but
> not *all* data races are race conditions. Those are what are usually
> referred to as "benign", but they can still become bugs on the wrong
> arch/compiler combination. Hence, the need to annotate these accesses
> with READ_ONCE, WRITE_ONCE or use atomic_t:

Well, if I see READ_ONCE() in the code I want to understand why it was
used. Is it really needed for correctness or we want to shut up kcsan?
Say, why should wait_event(wq, *ptr) use READ_ONCE()? Nevermind, please
forget.

Btw, why __kcsan_check_watchpoint() does user_access_save() before
try_consume_watchpoint() ?

Oleg.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
  2019-10-23 16:24         ` Oleg Nesterov
  (?)
@ 2019-10-24 11:02           ` Marco Elver
  -1 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-24 11:02 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Mark Rutland, Nicholas Piggin, Paul E. McKenney,
	Peter Zijlstra, Thomas Gleixner, Will Deacon, kasan-dev,
	linux-arch, open list:DOCUMENTATION, linux-efi,
	Linux Kbuild mailing list, LKML, Linux Memory Management List,
	the arch/x86 maintainers

On Wed, 23 Oct 2019 at 18:24, Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 10/22, Marco Elver wrote:
> >
> > On Tue, 22 Oct 2019 at 17:49, Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > Just for example. Suppose that task->state = TASK_UNINTERRUPTIBLE, this task
> > > does __set_current_state(TASK_RUNNING), another CPU does wake_up_process(task)
> > > which does the same UNINTERRUPTIBLE -> RUNNING transition.
> > >
> > > Looks like, this is the "data race" according to kcsan?
> >
> > Yes, they are "data races". They are probably not "race conditions" though.
> >
> > This is a fair distinction to make, and we never claimed to find "race
> > conditions" only
>
> I see, thanks, just wanted to be sure...
>
> > KCSAN's goal is to find *data races* according to the LKMM.  Some data
> > races are race conditions (usually the more interesting bugs) -- but
> > not *all* data races are race conditions. Those are what are usually
> > referred to as "benign", but they can still become bugs on the wrong
> > arch/compiler combination. Hence, the need to annotate these accesses
> > with READ_ONCE, WRITE_ONCE or use atomic_t:
>
> Well, if I see READ_ONCE() in the code I want to understand why it was
> used. Is it really needed for correctness or we want to shut up kcsan?
> Say, why should wait_event(wq, *ptr) use READ_ONCE()? Nevermind, please
> forget.
>
> Btw, why __kcsan_check_watchpoint() does user_access_save() before
> try_consume_watchpoint() ?

Instrumentation is added in UACCESS regions. Since we do not access
user-memory, we do user_access_save to ensure everything is safe
(otherwise objtool complains that we do calls to non-whitelisted
functions). I will try to optimize this a bit, but we can't avoid it.

> Oleg.
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-24 11:02           ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-24 11:02 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet

On Wed, 23 Oct 2019 at 18:24, Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 10/22, Marco Elver wrote:
> >
> > On Tue, 22 Oct 2019 at 17:49, Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > Just for example. Suppose that task->state = TASK_UNINTERRUPTIBLE, this task
> > > does __set_current_state(TASK_RUNNING), another CPU does wake_up_process(task)
> > > which does the same UNINTERRUPTIBLE -> RUNNING transition.
> > >
> > > Looks like, this is the "data race" according to kcsan?
> >
> > Yes, they are "data races". They are probably not "race conditions" though.
> >
> > This is a fair distinction to make, and we never claimed to find "race
> > conditions" only
>
> I see, thanks, just wanted to be sure...
>
> > KCSAN's goal is to find *data races* according to the LKMM.  Some data
> > races are race conditions (usually the more interesting bugs) -- but
> > not *all* data races are race conditions. Those are what are usually
> > referred to as "benign", but they can still become bugs on the wrong
> > arch/compiler combination. Hence, the need to annotate these accesses
> > with READ_ONCE, WRITE_ONCE or use atomic_t:
>
> Well, if I see READ_ONCE() in the code I want to understand why it was
> used. Is it really needed for correctness or we want to shut up kcsan?
> Say, why should wait_event(wq, *ptr) use READ_ONCE()? Nevermind, please
> forget.
>
> Btw, why __kcsan_check_watchpoint() does user_access_save() before
> try_consume_watchpoint() ?

Instrumentation is added in UACCESS regions. Since we do not access
user-memory, we do user_access_save to ensure everything is safe
(otherwise objtool complains that we do calls to non-whitelisted
functions). I will try to optimize this a bit, but we can't avoid it.

> Oleg.
>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure
@ 2019-10-24 11:02           ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-24 11:02 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Mark Rutland, Nicholas Piggin, Paul E. McKenney,
	Peter Zijlstra, Thomas Gleixner, Will Deacon, kasan-dev,
	linux-arch, open list:DOCUMENTATION, linux-efi,
	Linux Kbuild mailing list, LKML, Linux Memory Management List,
	the arch/x86 maintainers

On Wed, 23 Oct 2019 at 18:24, Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 10/22, Marco Elver wrote:
> >
> > On Tue, 22 Oct 2019 at 17:49, Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > Just for example. Suppose that task->state = TASK_UNINTERRUPTIBLE, this task
> > > does __set_current_state(TASK_RUNNING), another CPU does wake_up_process(task)
> > > which does the same UNINTERRUPTIBLE -> RUNNING transition.
> > >
> > > Looks like, this is the "data race" according to kcsan?
> >
> > Yes, they are "data races". They are probably not "race conditions" though.
> >
> > This is a fair distinction to make, and we never claimed to find "race
> > conditions" only
>
> I see, thanks, just wanted to be sure...
>
> > KCSAN's goal is to find *data races* according to the LKMM.  Some data
> > races are race conditions (usually the more interesting bugs) -- but
> > not *all* data races are race conditions. Those are what are usually
> > referred to as "benign", but they can still become bugs on the wrong
> > arch/compiler combination. Hence, the need to annotate these accesses
> > with READ_ONCE, WRITE_ONCE or use atomic_t:
>
> Well, if I see READ_ONCE() in the code I want to understand why it was
> used. Is it really needed for correctness or we want to shut up kcsan?
> Say, why should wait_event(wq, *ptr) use READ_ONCE()? Nevermind, please
> forget.
>
> Btw, why __kcsan_check_watchpoint() does user_access_save() before
> try_consume_watchpoint() ?

Instrumentation is added in UACCESS regions. Since we do not access
user-memory, we do user_access_save to ensure everything is safe
(otherwise objtool complains that we do calls to non-whitelisted
functions). I will try to optimize this a bit, but we can't avoid it.

> Oleg.
>


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 4/8] seqlock, kcsan: Add annotations for KCSAN
  2019-10-17 14:13   ` Marco Elver
  (?)
@ 2019-10-24 12:28   ` Mark Rutland
  2019-10-24 14:17       ` Marco Elver
  -1 siblings, 1 reply; 88+ messages in thread
From: Mark Rutland @ 2019-10-24 12:28 UTC (permalink / raw)
  To: Marco Elver
  Cc: akiyks, stern, glider, parri.andrea, andreyknvl, luto,
	ard.biesheuvel, arnd, boqun.feng, bp, dja, dlustig, dave.hansen,
	dhowells, dvyukov, hpa, mingo, j.alglave, joel, corbet, jpoimboe,
	luc.maranget, npiggin, paulmck, peterz, tglx, will, kasan-dev,
	linux-arch, linux-doc, linux-efi, linux-kbuild, linux-kernel,
	linux-mm, x86

On Thu, Oct 17, 2019 at 04:13:01PM +0200, Marco Elver wrote:
> Since seqlocks in the Linux kernel do not require the use of marked
> atomic accesses in critical sections, we teach KCSAN to assume such
> accesses are atomic. KCSAN currently also pretends that writes to
> `sequence` are atomic, although currently plain writes are used (their
> corresponding reads are READ_ONCE).
> 
> Further, to avoid false positives in the absence of clear ending of a
> seqlock reader critical section (only when using the raw interface),
> KCSAN assumes a fixed number of accesses after start of a seqlock
> critical section are atomic.

Do we have many examples where there's not a clear end to a seqlock
sequence? Or are there just a handful?

If there aren't that many, I wonder if we can make it mandatory to have
an explicit end, or to add some helper for those patterns so that we can
reliably hook them.

Thanks,
Mark.

> 
> Signed-off-by: Marco Elver <elver@google.com>
> ---
>  include/linux/seqlock.h | 44 +++++++++++++++++++++++++++++++++++++----
>  1 file changed, 40 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> index bcf4cf26b8c8..1e425831a7ed 100644
> --- a/include/linux/seqlock.h
> +++ b/include/linux/seqlock.h
> @@ -37,8 +37,24 @@
>  #include <linux/preempt.h>
>  #include <linux/lockdep.h>
>  #include <linux/compiler.h>
> +#include <linux/kcsan.h>
>  #include <asm/processor.h>
>  
> +/*
> + * The seqlock interface does not prescribe a precise sequence of read
> + * begin/retry/end. For readers, typically there is a call to
> + * read_seqcount_begin() and read_seqcount_retry(), however, there are more
> + * esoteric cases which do not follow this pattern.
> + *
> + * As a consequence, we take the following best-effort approach for *raw* usage
> + * of seqlocks under KCSAN: upon beginning a seq-reader critical section,
> + * pessimistically mark then next KCSAN_SEQLOCK_REGION_MAX memory accesses as
> + * atomics; if there is a matching read_seqcount_retry() call, no following
> + * memory operations are considered atomic. Non-raw usage of seqlocks is not
> + * affected.
> + */
> +#define KCSAN_SEQLOCK_REGION_MAX 1000
> +
>  /*
>   * Version using sequence counter only.
>   * This can be used when code has its own mutex protecting the
> @@ -115,6 +131,7 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
>  		cpu_relax();
>  		goto repeat;
>  	}
> +	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
>  	return ret;
>  }
>  
> @@ -131,6 +148,7 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
>  {
>  	unsigned ret = READ_ONCE(s->sequence);
>  	smp_rmb();
> +	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
>  	return ret;
>  }
>  
> @@ -183,6 +201,7 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
>  {
>  	unsigned ret = READ_ONCE(s->sequence);
>  	smp_rmb();
> +	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
>  	return ret & ~1;
>  }
>  
> @@ -202,7 +221,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
>   */
>  static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
>  {
> -	return unlikely(s->sequence != start);
> +	kcsan_atomic_next(0);
> +	return unlikely(READ_ONCE(s->sequence) != start);
>  }
>  
>  /**
> @@ -225,6 +245,7 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
>  
>  static inline void raw_write_seqcount_begin(seqcount_t *s)
>  {
> +	kcsan_begin_atomic(true);
>  	s->sequence++;
>  	smp_wmb();
>  }
> @@ -233,6 +254,7 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
>  {
>  	smp_wmb();
>  	s->sequence++;
> +	kcsan_end_atomic(true);
>  }
>  
>  /**
> @@ -262,18 +284,20 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
>   *
>   *      void write(void)
>   *      {
> - *              Y = true;
> + *              WRITE_ONCE(Y, true);
>   *
>   *              raw_write_seqcount_barrier(seq);
>   *
> - *              X = false;
> + *              WRITE_ONCE(X, false);
>   *      }
>   */
>  static inline void raw_write_seqcount_barrier(seqcount_t *s)
>  {
> +	kcsan_begin_atomic(true);
>  	s->sequence++;
>  	smp_wmb();
>  	s->sequence++;
> +	kcsan_end_atomic(true);
>  }
>  
>  static inline int raw_read_seqcount_latch(seqcount_t *s)
> @@ -398,7 +422,9 @@ static inline void write_seqcount_end(seqcount_t *s)
>  static inline void write_seqcount_invalidate(seqcount_t *s)
>  {
>  	smp_wmb();
> +	kcsan_begin_atomic(true);
>  	s->sequence+=2;
> +	kcsan_end_atomic(true);
>  }
>  
>  typedef struct {
> @@ -430,11 +456,21 @@ typedef struct {
>   */
>  static inline unsigned read_seqbegin(const seqlock_t *sl)
>  {
> -	return read_seqcount_begin(&sl->seqcount);
> +	unsigned ret = read_seqcount_begin(&sl->seqcount);
> +
> +	kcsan_atomic_next(0);  /* non-raw usage, assume closing read_seqretry */
> +	kcsan_begin_atomic(false);
> +	return ret;
>  }
>  
>  static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
>  {
> +	/*
> +	 * Assume not nested: read_seqretry may be called multiple times when
> +	 * completing read critical section.
> +	 */
> +	kcsan_end_atomic(false);
> +
>  	return read_seqcount_retry(&sl->seqcount, start);
>  }
>  
> -- 
> 2.23.0.866.gb869b98d4c-goog
> 

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 4/8] seqlock, kcsan: Add annotations for KCSAN
  2019-10-24 12:28   ` Mark Rutland
  2019-10-24 14:17       ` Marco Elver
@ 2019-10-24 14:17       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-24 14:17 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi, Linux Kbuild mailing list,
	LKML, Linux Memory Management List, the arch/x86 maintainers

On Thu, 24 Oct 2019 at 14:28, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Oct 17, 2019 at 04:13:01PM +0200, Marco Elver wrote:
> > Since seqlocks in the Linux kernel do not require the use of marked
> > atomic accesses in critical sections, we teach KCSAN to assume such
> > accesses are atomic. KCSAN currently also pretends that writes to
> > `sequence` are atomic, although currently plain writes are used (their
> > corresponding reads are READ_ONCE).
> >
> > Further, to avoid false positives in the absence of clear ending of a
> > seqlock reader critical section (only when using the raw interface),
> > KCSAN assumes a fixed number of accesses after start of a seqlock
> > critical section are atomic.
>
> Do we have many examples where there's not a clear end to a seqlock
> sequence? Or are there just a handful?
>
> If there aren't that many, I wonder if we can make it mandatory to have
> an explicit end, or to add some helper for those patterns so that we can
> reliably hook them.

In an ideal world, all usage of seqlocks would be via seqlock_t, which
follows a somewhat saner usage, where we already do normal begin/end
markings -- with subtle exception to readers needing to be flat atomic
regions, e.g. because usage like this:
- fs/namespace.c:__legitimize_mnt - unbalanced read_seqretry
- fs/dcache.c:d_walk - unbalanced need_seqretry

But anything directly accessing seqcount_t seems to be unpredictable.
Filtering for usage of read_seqcount_retry not following 'do { .. }
while (read_seqcount_retry(..));' (although even the ones in while
loops aren't necessarily predictable):

$ git grep 'read_seqcount_retry' | grep -Ev 'seqlock.h|Doc|\* ' | grep
-v 'while ('
=> about 1/3 of the total read_seqcount_retry usage.

Just looking at fs/namei.c, I would conclude that it'd be a pretty
daunting task to prescribe and migrate to an interface that forces
clear begin/end.

Which is why I concluded that for now, it is probably better to make
KCSAN play well with the existing code.

Thanks,
-- Marco

> Thanks,
> Mark.
>
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> >  include/linux/seqlock.h | 44 +++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 40 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> > index bcf4cf26b8c8..1e425831a7ed 100644
> > --- a/include/linux/seqlock.h
> > +++ b/include/linux/seqlock.h
> > @@ -37,8 +37,24 @@
> >  #include <linux/preempt.h>
> >  #include <linux/lockdep.h>
> >  #include <linux/compiler.h>
> > +#include <linux/kcsan.h>
> >  #include <asm/processor.h>
> >
> > +/*
> > + * The seqlock interface does not prescribe a precise sequence of read
> > + * begin/retry/end. For readers, typically there is a call to
> > + * read_seqcount_begin() and read_seqcount_retry(), however, there are more
> > + * esoteric cases which do not follow this pattern.
> > + *
> > + * As a consequence, we take the following best-effort approach for *raw* usage
> > + * of seqlocks under KCSAN: upon beginning a seq-reader critical section,
> > + * pessimistically mark then next KCSAN_SEQLOCK_REGION_MAX memory accesses as
> > + * atomics; if there is a matching read_seqcount_retry() call, no following
> > + * memory operations are considered atomic. Non-raw usage of seqlocks is not
> > + * affected.
> > + */
> > +#define KCSAN_SEQLOCK_REGION_MAX 1000
> > +
> >  /*
> >   * Version using sequence counter only.
> >   * This can be used when code has its own mutex protecting the
> > @@ -115,6 +131,7 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
> >               cpu_relax();
> >               goto repeat;
> >       }
> > +     kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
> >       return ret;
> >  }
> >
> > @@ -131,6 +148,7 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
> >  {
> >       unsigned ret = READ_ONCE(s->sequence);
> >       smp_rmb();
> > +     kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
> >       return ret;
> >  }
> >
> > @@ -183,6 +201,7 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
> >  {
> >       unsigned ret = READ_ONCE(s->sequence);
> >       smp_rmb();
> > +     kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
> >       return ret & ~1;
> >  }
> >
> > @@ -202,7 +221,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
> >   */
> >  static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
> >  {
> > -     return unlikely(s->sequence != start);
> > +     kcsan_atomic_next(0);
> > +     return unlikely(READ_ONCE(s->sequence) != start);
> >  }
> >
> >  /**
> > @@ -225,6 +245,7 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
> >
> >  static inline void raw_write_seqcount_begin(seqcount_t *s)
> >  {
> > +     kcsan_begin_atomic(true);
> >       s->sequence++;
> >       smp_wmb();
> >  }
> > @@ -233,6 +254,7 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
> >  {
> >       smp_wmb();
> >       s->sequence++;
> > +     kcsan_end_atomic(true);
> >  }
> >
> >  /**
> > @@ -262,18 +284,20 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
> >   *
> >   *      void write(void)
> >   *      {
> > - *              Y = true;
> > + *              WRITE_ONCE(Y, true);
> >   *
> >   *              raw_write_seqcount_barrier(seq);
> >   *
> > - *              X = false;
> > + *              WRITE_ONCE(X, false);
> >   *      }
> >   */
> >  static inline void raw_write_seqcount_barrier(seqcount_t *s)
> >  {
> > +     kcsan_begin_atomic(true);
> >       s->sequence++;
> >       smp_wmb();
> >       s->sequence++;
> > +     kcsan_end_atomic(true);
> >  }
> >
> >  static inline int raw_read_seqcount_latch(seqcount_t *s)
> > @@ -398,7 +422,9 @@ static inline void write_seqcount_end(seqcount_t *s)
> >  static inline void write_seqcount_invalidate(seqcount_t *s)
> >  {
> >       smp_wmb();
> > +     kcsan_begin_atomic(true);
> >       s->sequence+=2;
> > +     kcsan_end_atomic(true);
> >  }
> >
> >  typedef struct {
> > @@ -430,11 +456,21 @@ typedef struct {
> >   */
> >  static inline unsigned read_seqbegin(const seqlock_t *sl)
> >  {
> > -     return read_seqcount_begin(&sl->seqcount);
> > +     unsigned ret = read_seqcount_begin(&sl->seqcount);
> > +
> > +     kcsan_atomic_next(0);  /* non-raw usage, assume closing read_seqretry */
> > +     kcsan_begin_atomic(false);
> > +     return ret;
> >  }
> >
> >  static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
> >  {
> > +     /*
> > +      * Assume not nested: read_seqretry may be called multiple times when
> > +      * completing read critical section.
> > +      */
> > +     kcsan_end_atomic(false);
> > +
> >       return read_seqcount_retry(&sl->seqcount, start);
> >  }
> >
> > --
> > 2.23.0.866.gb869b98d4c-goog
> >

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 4/8] seqlock, kcsan: Add annotations for KCSAN
@ 2019-10-24 14:17       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-24 14:17 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet

On Thu, 24 Oct 2019 at 14:28, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Oct 17, 2019 at 04:13:01PM +0200, Marco Elver wrote:
> > Since seqlocks in the Linux kernel do not require the use of marked
> > atomic accesses in critical sections, we teach KCSAN to assume such
> > accesses are atomic. KCSAN currently also pretends that writes to
> > `sequence` are atomic, although currently plain writes are used (their
> > corresponding reads are READ_ONCE).
> >
> > Further, to avoid false positives in the absence of clear ending of a
> > seqlock reader critical section (only when using the raw interface),
> > KCSAN assumes a fixed number of accesses after start of a seqlock
> > critical section are atomic.
>
> Do we have many examples where there's not a clear end to a seqlock
> sequence? Or are there just a handful?
>
> If there aren't that many, I wonder if we can make it mandatory to have
> an explicit end, or to add some helper for those patterns so that we can
> reliably hook them.

In an ideal world, all usage of seqlocks would be via seqlock_t, which
follows a somewhat saner usage, where we already do normal begin/end
markings -- with subtle exception to readers needing to be flat atomic
regions, e.g. because usage like this:
- fs/namespace.c:__legitimize_mnt - unbalanced read_seqretry
- fs/dcache.c:d_walk - unbalanced need_seqretry

But anything directly accessing seqcount_t seems to be unpredictable.
Filtering for usage of read_seqcount_retry not following 'do { .. }
while (read_seqcount_retry(..));' (although even the ones in while
loops aren't necessarily predictable):

$ git grep 'read_seqcount_retry' | grep -Ev 'seqlock.h|Doc|\* ' | grep
-v 'while ('
=> about 1/3 of the total read_seqcount_retry usage.

Just looking at fs/namei.c, I would conclude that it'd be a pretty
daunting task to prescribe and migrate to an interface that forces
clear begin/end.

Which is why I concluded that for now, it is probably better to make
KCSAN play well with the existing code.

Thanks,
-- Marco

> Thanks,
> Mark.
>
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> >  include/linux/seqlock.h | 44 +++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 40 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> > index bcf4cf26b8c8..1e425831a7ed 100644
> > --- a/include/linux/seqlock.h
> > +++ b/include/linux/seqlock.h
> > @@ -37,8 +37,24 @@
> >  #include <linux/preempt.h>
> >  #include <linux/lockdep.h>
> >  #include <linux/compiler.h>
> > +#include <linux/kcsan.h>
> >  #include <asm/processor.h>
> >
> > +/*
> > + * The seqlock interface does not prescribe a precise sequence of read
> > + * begin/retry/end. For readers, typically there is a call to
> > + * read_seqcount_begin() and read_seqcount_retry(), however, there are more
> > + * esoteric cases which do not follow this pattern.
> > + *
> > + * As a consequence, we take the following best-effort approach for *raw* usage
> > + * of seqlocks under KCSAN: upon beginning a seq-reader critical section,
> > + * pessimistically mark then next KCSAN_SEQLOCK_REGION_MAX memory accesses as
> > + * atomics; if there is a matching read_seqcount_retry() call, no following
> > + * memory operations are considered atomic. Non-raw usage of seqlocks is not
> > + * affected.
> > + */
> > +#define KCSAN_SEQLOCK_REGION_MAX 1000
> > +
> >  /*
> >   * Version using sequence counter only.
> >   * This can be used when code has its own mutex protecting the
> > @@ -115,6 +131,7 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
> >               cpu_relax();
> >               goto repeat;
> >       }
> > +     kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
> >       return ret;
> >  }
> >
> > @@ -131,6 +148,7 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
> >  {
> >       unsigned ret = READ_ONCE(s->sequence);
> >       smp_rmb();
> > +     kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
> >       return ret;
> >  }
> >
> > @@ -183,6 +201,7 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
> >  {
> >       unsigned ret = READ_ONCE(s->sequence);
> >       smp_rmb();
> > +     kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
> >       return ret & ~1;
> >  }
> >
> > @@ -202,7 +221,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
> >   */
> >  static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
> >  {
> > -     return unlikely(s->sequence != start);
> > +     kcsan_atomic_next(0);
> > +     return unlikely(READ_ONCE(s->sequence) != start);
> >  }
> >
> >  /**
> > @@ -225,6 +245,7 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
> >
> >  static inline void raw_write_seqcount_begin(seqcount_t *s)
> >  {
> > +     kcsan_begin_atomic(true);
> >       s->sequence++;
> >       smp_wmb();
> >  }
> > @@ -233,6 +254,7 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
> >  {
> >       smp_wmb();
> >       s->sequence++;
> > +     kcsan_end_atomic(true);
> >  }
> >
> >  /**
> > @@ -262,18 +284,20 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
> >   *
> >   *      void write(void)
> >   *      {
> > - *              Y = true;
> > + *              WRITE_ONCE(Y, true);
> >   *
> >   *              raw_write_seqcount_barrier(seq);
> >   *
> > - *              X = false;
> > + *              WRITE_ONCE(X, false);
> >   *      }
> >   */
> >  static inline void raw_write_seqcount_barrier(seqcount_t *s)
> >  {
> > +     kcsan_begin_atomic(true);
> >       s->sequence++;
> >       smp_wmb();
> >       s->sequence++;
> > +     kcsan_end_atomic(true);
> >  }
> >
> >  static inline int raw_read_seqcount_latch(seqcount_t *s)
> > @@ -398,7 +422,9 @@ static inline void write_seqcount_end(seqcount_t *s)
> >  static inline void write_seqcount_invalidate(seqcount_t *s)
> >  {
> >       smp_wmb();
> > +     kcsan_begin_atomic(true);
> >       s->sequence+=2;
> > +     kcsan_end_atomic(true);
> >  }
> >
> >  typedef struct {
> > @@ -430,11 +456,21 @@ typedef struct {
> >   */
> >  static inline unsigned read_seqbegin(const seqlock_t *sl)
> >  {
> > -     return read_seqcount_begin(&sl->seqcount);
> > +     unsigned ret = read_seqcount_begin(&sl->seqcount);
> > +
> > +     kcsan_atomic_next(0);  /* non-raw usage, assume closing read_seqretry */
> > +     kcsan_begin_atomic(false);
> > +     return ret;
> >  }
> >
> >  static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
> >  {
> > +     /*
> > +      * Assume not nested: read_seqretry may be called multiple times when
> > +      * completing read critical section.
> > +      */
> > +     kcsan_end_atomic(false);
> > +
> >       return read_seqcount_retry(&sl->seqcount, start);
> >  }
> >
> > --
> > 2.23.0.866.gb869b98d4c-goog
> >

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 4/8] seqlock, kcsan: Add annotations for KCSAN
@ 2019-10-24 14:17       ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-24 14:17 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi, Linux Kbuild mailing list,
	LKML, Linux Memory Management List, the arch/x86 maintainers

On Thu, 24 Oct 2019 at 14:28, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Oct 17, 2019 at 04:13:01PM +0200, Marco Elver wrote:
> > Since seqlocks in the Linux kernel do not require the use of marked
> > atomic accesses in critical sections, we teach KCSAN to assume such
> > accesses are atomic. KCSAN currently also pretends that writes to
> > `sequence` are atomic, although currently plain writes are used (their
> > corresponding reads are READ_ONCE).
> >
> > Further, to avoid false positives in the absence of clear ending of a
> > seqlock reader critical section (only when using the raw interface),
> > KCSAN assumes a fixed number of accesses after start of a seqlock
> > critical section are atomic.
>
> Do we have many examples where there's not a clear end to a seqlock
> sequence? Or are there just a handful?
>
> If there aren't that many, I wonder if we can make it mandatory to have
> an explicit end, or to add some helper for those patterns so that we can
> reliably hook them.

In an ideal world, all usage of seqlocks would be via seqlock_t, which
follows a somewhat saner usage, where we already do normal begin/end
markings -- with subtle exception to readers needing to be flat atomic
regions, e.g. because usage like this:
- fs/namespace.c:__legitimize_mnt - unbalanced read_seqretry
- fs/dcache.c:d_walk - unbalanced need_seqretry

But anything directly accessing seqcount_t seems to be unpredictable.
Filtering for usage of read_seqcount_retry not following 'do { .. }
while (read_seqcount_retry(..));' (although even the ones in while
loops aren't necessarily predictable):

$ git grep 'read_seqcount_retry' | grep -Ev 'seqlock.h|Doc|\* ' | grep
-v 'while ('
=> about 1/3 of the total read_seqcount_retry usage.

Just looking at fs/namei.c, I would conclude that it'd be a pretty
daunting task to prescribe and migrate to an interface that forces
clear begin/end.

Which is why I concluded that for now, it is probably better to make
KCSAN play well with the existing code.

Thanks,
-- Marco

> Thanks,
> Mark.
>
> >
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> >  include/linux/seqlock.h | 44 +++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 40 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> > index bcf4cf26b8c8..1e425831a7ed 100644
> > --- a/include/linux/seqlock.h
> > +++ b/include/linux/seqlock.h
> > @@ -37,8 +37,24 @@
> >  #include <linux/preempt.h>
> >  #include <linux/lockdep.h>
> >  #include <linux/compiler.h>
> > +#include <linux/kcsan.h>
> >  #include <asm/processor.h>
> >
> > +/*
> > + * The seqlock interface does not prescribe a precise sequence of read
> > + * begin/retry/end. For readers, typically there is a call to
> > + * read_seqcount_begin() and read_seqcount_retry(), however, there are more
> > + * esoteric cases which do not follow this pattern.
> > + *
> > + * As a consequence, we take the following best-effort approach for *raw* usage
> > + * of seqlocks under KCSAN: upon beginning a seq-reader critical section,
> > + * pessimistically mark then next KCSAN_SEQLOCK_REGION_MAX memory accesses as
> > + * atomics; if there is a matching read_seqcount_retry() call, no following
> > + * memory operations are considered atomic. Non-raw usage of seqlocks is not
> > + * affected.
> > + */
> > +#define KCSAN_SEQLOCK_REGION_MAX 1000
> > +
> >  /*
> >   * Version using sequence counter only.
> >   * This can be used when code has its own mutex protecting the
> > @@ -115,6 +131,7 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
> >               cpu_relax();
> >               goto repeat;
> >       }
> > +     kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
> >       return ret;
> >  }
> >
> > @@ -131,6 +148,7 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
> >  {
> >       unsigned ret = READ_ONCE(s->sequence);
> >       smp_rmb();
> > +     kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
> >       return ret;
> >  }
> >
> > @@ -183,6 +201,7 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
> >  {
> >       unsigned ret = READ_ONCE(s->sequence);
> >       smp_rmb();
> > +     kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
> >       return ret & ~1;
> >  }
> >
> > @@ -202,7 +221,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
> >   */
> >  static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
> >  {
> > -     return unlikely(s->sequence != start);
> > +     kcsan_atomic_next(0);
> > +     return unlikely(READ_ONCE(s->sequence) != start);
> >  }
> >
> >  /**
> > @@ -225,6 +245,7 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
> >
> >  static inline void raw_write_seqcount_begin(seqcount_t *s)
> >  {
> > +     kcsan_begin_atomic(true);
> >       s->sequence++;
> >       smp_wmb();
> >  }
> > @@ -233,6 +254,7 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
> >  {
> >       smp_wmb();
> >       s->sequence++;
> > +     kcsan_end_atomic(true);
> >  }
> >
> >  /**
> > @@ -262,18 +284,20 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
> >   *
> >   *      void write(void)
> >   *      {
> > - *              Y = true;
> > + *              WRITE_ONCE(Y, true);
> >   *
> >   *              raw_write_seqcount_barrier(seq);
> >   *
> > - *              X = false;
> > + *              WRITE_ONCE(X, false);
> >   *      }
> >   */
> >  static inline void raw_write_seqcount_barrier(seqcount_t *s)
> >  {
> > +     kcsan_begin_atomic(true);
> >       s->sequence++;
> >       smp_wmb();
> >       s->sequence++;
> > +     kcsan_end_atomic(true);
> >  }
> >
> >  static inline int raw_read_seqcount_latch(seqcount_t *s)
> > @@ -398,7 +422,9 @@ static inline void write_seqcount_end(seqcount_t *s)
> >  static inline void write_seqcount_invalidate(seqcount_t *s)
> >  {
> >       smp_wmb();
> > +     kcsan_begin_atomic(true);
> >       s->sequence+=2;
> > +     kcsan_end_atomic(true);
> >  }
> >
> >  typedef struct {
> > @@ -430,11 +456,21 @@ typedef struct {
> >   */
> >  static inline unsigned read_seqbegin(const seqlock_t *sl)
> >  {
> > -     return read_seqcount_begin(&sl->seqcount);
> > +     unsigned ret = read_seqcount_begin(&sl->seqcount);
> > +
> > +     kcsan_atomic_next(0);  /* non-raw usage, assume closing read_seqretry */
> > +     kcsan_begin_atomic(false);
> > +     return ret;
> >  }
> >
> >  static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
> >  {
> > +     /*
> > +      * Assume not nested: read_seqretry may be called multiple times when
> > +      * completing read critical section.
> > +      */
> > +     kcsan_end_atomic(false);
> > +
> >       return read_seqcount_retry(&sl->seqcount, start);
> >  }
> >
> > --
> > 2.23.0.866.gb869b98d4c-goog
> >


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 4/8] seqlock, kcsan: Add annotations for KCSAN
  2019-10-24 14:17       ` Marco Elver
@ 2019-10-24 16:35         ` Mark Rutland
  -1 siblings, 0 replies; 88+ messages in thread
From: Mark Rutland @ 2019-10-24 16:35 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi, Linux Kbuild mailing list,
	LKML, Linux Memory Management List, the arch/x86 maintainers

On Thu, Oct 24, 2019 at 04:17:11PM +0200, Marco Elver wrote:
> On Thu, 24 Oct 2019 at 14:28, Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Thu, Oct 17, 2019 at 04:13:01PM +0200, Marco Elver wrote:
> > > Since seqlocks in the Linux kernel do not require the use of marked
> > > atomic accesses in critical sections, we teach KCSAN to assume such
> > > accesses are atomic. KCSAN currently also pretends that writes to
> > > `sequence` are atomic, although currently plain writes are used (their
> > > corresponding reads are READ_ONCE).
> > >
> > > Further, to avoid false positives in the absence of clear ending of a
> > > seqlock reader critical section (only when using the raw interface),
> > > KCSAN assumes a fixed number of accesses after start of a seqlock
> > > critical section are atomic.
> >
> > Do we have many examples where there's not a clear end to a seqlock
> > sequence? Or are there just a handful?
> >
> > If there aren't that many, I wonder if we can make it mandatory to have
> > an explicit end, or to add some helper for those patterns so that we can
> > reliably hook them.
> 
> In an ideal world, all usage of seqlocks would be via seqlock_t, which
> follows a somewhat saner usage, where we already do normal begin/end
> markings -- with subtle exception to readers needing to be flat atomic
> regions, e.g. because usage like this:
> - fs/namespace.c:__legitimize_mnt - unbalanced read_seqretry
> - fs/dcache.c:d_walk - unbalanced need_seqretry
> 
> But anything directly accessing seqcount_t seems to be unpredictable.
> Filtering for usage of read_seqcount_retry not following 'do { .. }
> while (read_seqcount_retry(..));' (although even the ones in while
> loops aren't necessarily predictable):
> 
> $ git grep 'read_seqcount_retry' | grep -Ev 'seqlock.h|Doc|\* ' | grep
> -v 'while ('
> => about 1/3 of the total read_seqcount_retry usage.
> 
> Just looking at fs/namei.c, I would conclude that it'd be a pretty
> daunting task to prescribe and migrate to an interface that forces
> clear begin/end.
> 
> Which is why I concluded that for now, it is probably better to make
> KCSAN play well with the existing code.

Thanks for the detailed explanation, it's very helpful.

That all sounds reasonable to me -- could you fold some of that into the
commit message?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 4/8] seqlock, kcsan: Add annotations for KCSAN
@ 2019-10-24 16:35         ` Mark Rutland
  0 siblings, 0 replies; 88+ messages in thread
From: Mark Rutland @ 2019-10-24 16:35 UTC (permalink / raw)
  To: Marco Elver
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet

On Thu, Oct 24, 2019 at 04:17:11PM +0200, Marco Elver wrote:
> On Thu, 24 Oct 2019 at 14:28, Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Thu, Oct 17, 2019 at 04:13:01PM +0200, Marco Elver wrote:
> > > Since seqlocks in the Linux kernel do not require the use of marked
> > > atomic accesses in critical sections, we teach KCSAN to assume such
> > > accesses are atomic. KCSAN currently also pretends that writes to
> > > `sequence` are atomic, although currently plain writes are used (their
> > > corresponding reads are READ_ONCE).
> > >
> > > Further, to avoid false positives in the absence of clear ending of a
> > > seqlock reader critical section (only when using the raw interface),
> > > KCSAN assumes a fixed number of accesses after start of a seqlock
> > > critical section are atomic.
> >
> > Do we have many examples where there's not a clear end to a seqlock
> > sequence? Or are there just a handful?
> >
> > If there aren't that many, I wonder if we can make it mandatory to have
> > an explicit end, or to add some helper for those patterns so that we can
> > reliably hook them.
> 
> In an ideal world, all usage of seqlocks would be via seqlock_t, which
> follows a somewhat saner usage, where we already do normal begin/end
> markings -- with subtle exception to readers needing to be flat atomic
> regions, e.g. because usage like this:
> - fs/namespace.c:__legitimize_mnt - unbalanced read_seqretry
> - fs/dcache.c:d_walk - unbalanced need_seqretry
> 
> But anything directly accessing seqcount_t seems to be unpredictable.
> Filtering for usage of read_seqcount_retry not following 'do { .. }
> while (read_seqcount_retry(..));' (although even the ones in while
> loops aren't necessarily predictable):
> 
> $ git grep 'read_seqcount_retry' | grep -Ev 'seqlock.h|Doc|\* ' | grep
> -v 'while ('
> => about 1/3 of the total read_seqcount_retry usage.
> 
> Just looking at fs/namei.c, I would conclude that it'd be a pretty
> daunting task to prescribe and migrate to an interface that forces
> clear begin/end.
> 
> Which is why I concluded that for now, it is probably better to make
> KCSAN play well with the existing code.

Thanks for the detailed explanation, it's very helpful.

That all sounds reasonable to me -- could you fold some of that into the
commit message?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 4/8] seqlock, kcsan: Add annotations for KCSAN
  2019-10-24 16:35         ` Mark Rutland
  (?)
@ 2019-10-24 17:09           ` Marco Elver
  -1 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-24 17:09 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi, Linux Kbuild mailing list,
	LKML, Linux Memory Management List, the arch/x86 maintainers

On Thu, 24 Oct 2019 at 18:35, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Oct 24, 2019 at 04:17:11PM +0200, Marco Elver wrote:
> > On Thu, 24 Oct 2019 at 14:28, Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 04:13:01PM +0200, Marco Elver wrote:
> > > > Since seqlocks in the Linux kernel do not require the use of marked
> > > > atomic accesses in critical sections, we teach KCSAN to assume such
> > > > accesses are atomic. KCSAN currently also pretends that writes to
> > > > `sequence` are atomic, although currently plain writes are used (their
> > > > corresponding reads are READ_ONCE).
> > > >
> > > > Further, to avoid false positives in the absence of clear ending of a
> > > > seqlock reader critical section (only when using the raw interface),
> > > > KCSAN assumes a fixed number of accesses after start of a seqlock
> > > > critical section are atomic.
> > >
> > > Do we have many examples where there's not a clear end to a seqlock
> > > sequence? Or are there just a handful?
> > >
> > > If there aren't that many, I wonder if we can make it mandatory to have
> > > an explicit end, or to add some helper for those patterns so that we can
> > > reliably hook them.
> >
> > In an ideal world, all usage of seqlocks would be via seqlock_t, which
> > follows a somewhat saner usage, where we already do normal begin/end
> > markings -- with subtle exception to readers needing to be flat atomic
> > regions, e.g. because usage like this:
> > - fs/namespace.c:__legitimize_mnt - unbalanced read_seqretry
> > - fs/dcache.c:d_walk - unbalanced need_seqretry
> >
> > But anything directly accessing seqcount_t seems to be unpredictable.
> > Filtering for usage of read_seqcount_retry not following 'do { .. }
> > while (read_seqcount_retry(..));' (although even the ones in while
> > loops aren't necessarily predictable):
> >
> > $ git grep 'read_seqcount_retry' | grep -Ev 'seqlock.h|Doc|\* ' | grep
> > -v 'while ('
> > => about 1/3 of the total read_seqcount_retry usage.
> >
> > Just looking at fs/namei.c, I would conclude that it'd be a pretty
> > daunting task to prescribe and migrate to an interface that forces
> > clear begin/end.
> >
> > Which is why I concluded that for now, it is probably better to make
> > KCSAN play well with the existing code.
>
> Thanks for the detailed explanation, it's very helpful.
>
> That all sounds reasonable to me -- could you fold some of that into the
> commit message?

Thanks, will do. (I hope to have v3 ready by some time next week.)

-- Marco

> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 4/8] seqlock, kcsan: Add annotations for KCSAN
@ 2019-10-24 17:09           ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-24 17:09 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet

On Thu, 24 Oct 2019 at 18:35, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Oct 24, 2019 at 04:17:11PM +0200, Marco Elver wrote:
> > On Thu, 24 Oct 2019 at 14:28, Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 04:13:01PM +0200, Marco Elver wrote:
> > > > Since seqlocks in the Linux kernel do not require the use of marked
> > > > atomic accesses in critical sections, we teach KCSAN to assume such
> > > > accesses are atomic. KCSAN currently also pretends that writes to
> > > > `sequence` are atomic, although currently plain writes are used (their
> > > > corresponding reads are READ_ONCE).
> > > >
> > > > Further, to avoid false positives in the absence of clear ending of a
> > > > seqlock reader critical section (only when using the raw interface),
> > > > KCSAN assumes a fixed number of accesses after start of a seqlock
> > > > critical section are atomic.
> > >
> > > Do we have many examples where there's not a clear end to a seqlock
> > > sequence? Or are there just a handful?
> > >
> > > If there aren't that many, I wonder if we can make it mandatory to have
> > > an explicit end, or to add some helper for those patterns so that we can
> > > reliably hook them.
> >
> > In an ideal world, all usage of seqlocks would be via seqlock_t, which
> > follows a somewhat saner usage, where we already do normal begin/end
> > markings -- with subtle exception to readers needing to be flat atomic
> > regions, e.g. because usage like this:
> > - fs/namespace.c:__legitimize_mnt - unbalanced read_seqretry
> > - fs/dcache.c:d_walk - unbalanced need_seqretry
> >
> > But anything directly accessing seqcount_t seems to be unpredictable.
> > Filtering for usage of read_seqcount_retry not following 'do { .. }
> > while (read_seqcount_retry(..));' (although even the ones in while
> > loops aren't necessarily predictable):
> >
> > $ git grep 'read_seqcount_retry' | grep -Ev 'seqlock.h|Doc|\* ' | grep
> > -v 'while ('
> > => about 1/3 of the total read_seqcount_retry usage.
> >
> > Just looking at fs/namei.c, I would conclude that it'd be a pretty
> > daunting task to prescribe and migrate to an interface that forces
> > clear begin/end.
> >
> > Which is why I concluded that for now, it is probably better to make
> > KCSAN play well with the existing code.
>
> Thanks for the detailed explanation, it's very helpful.
>
> That all sounds reasonable to me -- could you fold some of that into the
> commit message?

Thanks, will do. (I hope to have v3 ready by some time next week.)

-- Marco

> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 4/8] seqlock, kcsan: Add annotations for KCSAN
@ 2019-10-24 17:09           ` Marco Elver
  0 siblings, 0 replies; 88+ messages in thread
From: Marco Elver @ 2019-10-24 17:09 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKMM Maintainers -- Akira Yokosawa, Alan Stern,
	Alexander Potapenko, Andrea Parri, Andrey Konovalov,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Boqun Feng,
	Borislav Petkov, Daniel Axtens, Daniel Lustig, Dave Hansen,
	David Howells, Dmitry Vyukov, H. Peter Anvin, Ingo Molnar,
	Jade Alglave, Joel Fernandes, Jonathan Corbet, Josh Poimboeuf,
	Luc Maranget, Nicholas Piggin, Paul E. McKenney, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, kasan-dev, linux-arch,
	open list:DOCUMENTATION, linux-efi, Linux Kbuild mailing list,
	LKML, Linux Memory Management List, the arch/x86 maintainers

On Thu, 24 Oct 2019 at 18:35, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Oct 24, 2019 at 04:17:11PM +0200, Marco Elver wrote:
> > On Thu, 24 Oct 2019 at 14:28, Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 04:13:01PM +0200, Marco Elver wrote:
> > > > Since seqlocks in the Linux kernel do not require the use of marked
> > > > atomic accesses in critical sections, we teach KCSAN to assume such
> > > > accesses are atomic. KCSAN currently also pretends that writes to
> > > > `sequence` are atomic, although currently plain writes are used (their
> > > > corresponding reads are READ_ONCE).
> > > >
> > > > Further, to avoid false positives in the absence of clear ending of a
> > > > seqlock reader critical section (only when using the raw interface),
> > > > KCSAN assumes a fixed number of accesses after start of a seqlock
> > > > critical section are atomic.
> > >
> > > Do we have many examples where there's not a clear end to a seqlock
> > > sequence? Or are there just a handful?
> > >
> > > If there aren't that many, I wonder if we can make it mandatory to have
> > > an explicit end, or to add some helper for those patterns so that we can
> > > reliably hook them.
> >
> > In an ideal world, all usage of seqlocks would be via seqlock_t, which
> > follows a somewhat saner usage, where we already do normal begin/end
> > markings -- with subtle exception to readers needing to be flat atomic
> > regions, e.g. because usage like this:
> > - fs/namespace.c:__legitimize_mnt - unbalanced read_seqretry
> > - fs/dcache.c:d_walk - unbalanced need_seqretry
> >
> > But anything directly accessing seqcount_t seems to be unpredictable.
> > Filtering for usage of read_seqcount_retry not following 'do { .. }
> > while (read_seqcount_retry(..));' (although even the ones in while
> > loops aren't necessarily predictable):
> >
> > $ git grep 'read_seqcount_retry' | grep -Ev 'seqlock.h|Doc|\* ' | grep
> > -v 'while ('
> > => about 1/3 of the total read_seqcount_retry usage.
> >
> > Just looking at fs/namei.c, I would conclude that it'd be a pretty
> > daunting task to prescribe and migrate to an interface that forces
> > clear begin/end.
> >
> > Which is why I concluded that for now, it is probably better to make
> > KCSAN play well with the existing code.
>
> Thanks for the detailed explanation, it's very helpful.
>
> That all sounds reasonable to me -- could you fold some of that into the
> commit message?

Thanks, will do. (I hope to have v3 ready by some time next week.)

-- Marco

> Thanks,
> Mark.


^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2019-10-24 17:09 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-17 14:12 [PATCH v2 0/8] Add Kernel Concurrency Sanitizer (KCSAN) Marco Elver
2019-10-17 14:12 ` Marco Elver
2019-10-17 14:12 ` [PATCH v2 1/8] kcsan: Add Kernel Concurrency Sanitizer infrastructure Marco Elver
2019-10-17 14:12   ` Marco Elver
2019-10-21 13:37   ` Alexander Potapenko
2019-10-21 13:37     ` Alexander Potapenko
2019-10-21 13:37     ` Alexander Potapenko
2019-10-21 15:54     ` Marco Elver
2019-10-21 15:54       ` Marco Elver
2019-10-21 15:54       ` Marco Elver
2019-10-21 15:54       ` Marco Elver
2019-10-22 14:11   ` Mark Rutland
2019-10-22 16:52     ` Marco Elver
2019-10-22 16:52       ` Marco Elver
2019-10-22 16:52       ` Marco Elver
2019-10-22 15:48   ` Oleg Nesterov
2019-10-22 17:42     ` Marco Elver
2019-10-22 17:42       ` Marco Elver
2019-10-22 17:42       ` Marco Elver
2019-10-23 16:24       ` Oleg Nesterov
2019-10-23 16:24         ` Oleg Nesterov
2019-10-24 11:02         ` Marco Elver
2019-10-24 11:02           ` Marco Elver
2019-10-24 11:02           ` Marco Elver
2019-10-23  9:41   ` Dmitry Vyukov
2019-10-23  9:41     ` Dmitry Vyukov
2019-10-23  9:41     ` Dmitry Vyukov
2019-10-23  9:56   ` Dmitry Vyukov
2019-10-23  9:56     ` Dmitry Vyukov
2019-10-23  9:56     ` Dmitry Vyukov
2019-10-23 10:03   ` Dmitry Vyukov
2019-10-23 10:03     ` Dmitry Vyukov
2019-10-23 10:03     ` Dmitry Vyukov
2019-10-23 10:09   ` Dmitry Vyukov
2019-10-23 10:09     ` Dmitry Vyukov
2019-10-23 10:09     ` Dmitry Vyukov
2019-10-23 10:28   ` Dmitry Vyukov
2019-10-23 10:28     ` Dmitry Vyukov
2019-10-23 10:28     ` Dmitry Vyukov
2019-10-23 11:08   ` Dmitry Vyukov
2019-10-23 11:08     ` Dmitry Vyukov
2019-10-23 11:08     ` Dmitry Vyukov
2019-10-23 11:20   ` Dmitry Vyukov
2019-10-23 11:20     ` Dmitry Vyukov
2019-10-23 11:20     ` Dmitry Vyukov
2019-10-23 12:05   ` Dmitry Vyukov
2019-10-23 12:05     ` Dmitry Vyukov
2019-10-23 12:05     ` Dmitry Vyukov
2019-10-23 12:32   ` Dmitry Vyukov
2019-10-23 12:32     ` Dmitry Vyukov
2019-10-23 12:32     ` Dmitry Vyukov
2019-10-17 14:12 ` [PATCH v2 2/8] objtool, kcsan: Add KCSAN runtime functions to whitelist Marco Elver
2019-10-17 14:12   ` Marco Elver
2019-10-21 15:15   ` Dmitry Vyukov
2019-10-21 15:15     ` Dmitry Vyukov
2019-10-21 15:15     ` Dmitry Vyukov
2019-10-21 15:43     ` Marco Elver
2019-10-21 15:43       ` Marco Elver
2019-10-21 15:43       ` Marco Elver
2019-10-17 14:13 ` [PATCH v2 3/8] build, kcsan: Add KCSAN build exceptions Marco Elver
2019-10-17 14:13   ` Marco Elver
2019-10-17 14:13 ` [PATCH v2 4/8] seqlock, kcsan: Add annotations for KCSAN Marco Elver
2019-10-17 14:13   ` Marco Elver
2019-10-24 12:28   ` Mark Rutland
2019-10-24 14:17     ` Marco Elver
2019-10-24 14:17       ` Marco Elver
2019-10-24 14:17       ` Marco Elver
2019-10-24 16:35       ` Mark Rutland
2019-10-24 16:35         ` Mark Rutland
2019-10-24 17:09         ` Marco Elver
2019-10-24 17:09           ` Marco Elver
2019-10-24 17:09           ` Marco Elver
2019-10-17 14:13 ` [PATCH v2 5/8] seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier Marco Elver
2019-10-17 14:13   ` Marco Elver
2019-10-17 14:13 ` [PATCH v2 6/8] asm-generic, kcsan: Add KCSAN instrumentation for bitops Marco Elver
2019-10-17 14:13   ` Marco Elver
2019-10-17 14:13 ` [PATCH v2 7/8] locking/atomics, kcsan: Add KCSAN instrumentation Marco Elver
2019-10-17 14:13   ` Marco Elver
2019-10-22 12:33   ` Mark Rutland
2019-10-22 18:17     ` Marco Elver
2019-10-22 18:17       ` Marco Elver
2019-10-22 18:17       ` Marco Elver
2019-10-17 14:13 ` [PATCH v2 8/8] x86, kcsan: Enable KCSAN for x86 Marco Elver
2019-10-17 14:13   ` Marco Elver
2019-10-22 12:59   ` Mark Rutland
2019-10-22 13:02     ` Marco Elver
2019-10-22 13:02       ` Marco Elver
2019-10-22 13:02       ` Marco Elver

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.