All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/22] tracing vs world
@ 2020-02-19 14:47 Peter Zijlstra
  2020-02-19 14:47 ` [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter() Peter Zijlstra
                   ` (21 more replies)
  0 siblings, 22 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat

Hi all,


These here patches are the result of Mathieu and Steve trying to get commit
865e63b04e9b2 ("tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle
tracepoints") reverted again.

One of the things discovered is that tracing MUST NOT happen before nmi_enter()
or after nmi_exit(). Audit results of the previous version are still valid.

This then snowballed into auditing other exceptions, notably #MC, and #BP. Lots
of patches came out of that.

I would love for some tooling in this area. Dan, smatch has full callchains
right? Would it be possible to have an __assert_no_tracing__() marker of sorts
that validates that no possible callchain reaching that assertion has hit
tracing before that point?

It would mean you have to handle the various means of 'notrace' annotation
(both the function attribute as well as the Makefile rules), recognising
tracepoints and ideally handling NOKPROBE annotations.

Changes since -v2:

 - #MC / ist_enter() audit -- first 4 patches. After this in_nmi() should
   always be set 'correctly'.
 - RCU IRQ enter/exit function simplification
 - #BP / poke_int3_handler() audit -- last many patches.
 - pulled in some locking/kcsan patches

Changes since -v1:

 - Added tags
 - Changed #4; changed nmi_enter() to use __preempt_count_add() vs
   marking preempt_count_add() notrace.
 - Changed #5; confusion on which functions are notrace due to Makefile
 - Added #9; remove limitation on the perf-function-trace coupling


^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter()
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 15:31   ` Steven Rostedt
                     ` (3 more replies)
  2020-02-19 14:47 ` [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic() Peter Zijlstra
                   ` (20 subsequent siblings)
  21 siblings, 4 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat, Will Deacon, Marc Zyngier,
	Michael Ellerman, Petr Mladek

Since there are already a number of sites (ARM64, PowerPC) that
effectively nest nmi_enter(), lets make the primitive support this
before adding even more.

Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Petr Mladek <pmladek@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/include/asm/hardirq.h |    4 ++--
 arch/arm64/kernel/sdei.c         |   14 ++------------
 arch/arm64/kernel/traps.c        |    8 ++------
 arch/powerpc/kernel/traps.c      |   19 +++++--------------
 include/linux/hardirq.h          |    2 +-
 include/linux/preempt.h          |    4 ++--
 kernel/printk/printk_safe.c      |    6 ++++--
 7 files changed, 18 insertions(+), 39 deletions(-)

--- a/arch/arm64/include/asm/hardirq.h
+++ b/arch/arm64/include/asm/hardirq.h
@@ -38,7 +38,7 @@ DECLARE_PER_CPU(struct nmi_ctx, nmi_cont
 
 #define arch_nmi_enter()							\
 	do {									\
-		if (is_kernel_in_hyp_mode()) {					\
+		if (is_kernel_in_hyp_mode() && !in_nmi()) {			\
 			struct nmi_ctx *nmi_ctx = this_cpu_ptr(&nmi_contexts);	\
 			nmi_ctx->hcr = read_sysreg(hcr_el2);			\
 			if (!(nmi_ctx->hcr & HCR_TGE)) {			\
@@ -50,7 +50,7 @@ DECLARE_PER_CPU(struct nmi_ctx, nmi_cont
 
 #define arch_nmi_exit()								\
 	do {									\
-		if (is_kernel_in_hyp_mode()) {					\
+		if (is_kernel_in_hyp_mode() && !in_nmi()) {			\
 			struct nmi_ctx *nmi_ctx = this_cpu_ptr(&nmi_contexts);	\
 			if (!(nmi_ctx->hcr & HCR_TGE))				\
 				write_sysreg(nmi_ctx->hcr, hcr_el2);		\
--- a/arch/arm64/kernel/sdei.c
+++ b/arch/arm64/kernel/sdei.c
@@ -251,22 +251,12 @@ asmlinkage __kprobes notrace unsigned lo
 __sdei_handler(struct pt_regs *regs, struct sdei_registered_event *arg)
 {
 	unsigned long ret;
-	bool do_nmi_exit = false;
 
-	/*
-	 * nmi_enter() deals with printk() re-entrance and use of RCU when
-	 * RCU believed this CPU was idle. Because critical events can
-	 * interrupt normal events, we may already be in_nmi().
-	 */
-	if (!in_nmi()) {
-		nmi_enter();
-		do_nmi_exit = true;
-	}
+	nmi_enter();
 
 	ret = _sdei_handler(regs, arg);
 
-	if (do_nmi_exit)
-		nmi_exit();
+	nmi_exit();
 
 	return ret;
 }
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -906,17 +906,13 @@ bool arm64_is_fatal_ras_serror(struct pt
 
 asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
 {
-	const bool was_in_nmi = in_nmi();
-
-	if (!was_in_nmi)
-		nmi_enter();
+	nmi_enter();
 
 	/* non-RAS errors are not containable */
 	if (!arm64_is_ras_serror(esr) || arm64_is_fatal_ras_serror(regs, esr))
 		arm64_serror_panic(regs, esr);
 
-	if (!was_in_nmi)
-		nmi_exit();
+	nmi_exit();
 }
 
 asmlinkage void enter_from_user_mode(void)
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -441,15 +441,9 @@ void hv_nmi_check_nonrecoverable(struct
 void system_reset_exception(struct pt_regs *regs)
 {
 	unsigned long hsrr0, hsrr1;
-	bool nested = in_nmi();
 	bool saved_hsrrs = false;
 
-	/*
-	 * Avoid crashes in case of nested NMI exceptions. Recoverability
-	 * is determined by RI and in_nmi
-	 */
-	if (!nested)
-		nmi_enter();
+	nmi_enter();
 
 	/*
 	 * System reset can interrupt code where HSRRs are live and MSR[RI]=1.
@@ -521,8 +515,7 @@ void system_reset_exception(struct pt_re
 		mtspr(SPRN_HSRR1, hsrr1);
 	}
 
-	if (!nested)
-		nmi_exit();
+	nmi_exit();
 
 	/* What should we do here? We could issue a shutdown or hard reset. */
 }
@@ -823,9 +816,8 @@ int machine_check_generic(struct pt_regs
 void machine_check_exception(struct pt_regs *regs)
 {
 	int recover = 0;
-	bool nested = in_nmi();
-	if (!nested)
-		nmi_enter();
+
+	nmi_enter();
 
 	__this_cpu_inc(irq_stat.mce_exceptions);
 
@@ -863,8 +855,7 @@ void machine_check_exception(struct pt_r
 	return;
 
 bail:
-	if (!nested)
-		nmi_exit();
+	nmi_exit();
 }
 
 void SMIException(struct pt_regs *regs)
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -71,7 +71,7 @@ extern void irq_exit(void);
 		printk_nmi_enter();				\
 		lockdep_off();					\
 		ftrace_nmi_enter();				\
-		BUG_ON(in_nmi());				\
+		BUG_ON(in_nmi() == NMI_MASK);			\
 		preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
 		rcu_nmi_enter();				\
 		trace_hardirq_enter();				\
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -26,13 +26,13 @@
  *         PREEMPT_MASK:	0x000000ff
  *         SOFTIRQ_MASK:	0x0000ff00
  *         HARDIRQ_MASK:	0x000f0000
- *             NMI_MASK:	0x00100000
+ *             NMI_MASK:	0x00f00000
  * PREEMPT_NEED_RESCHED:	0x80000000
  */
 #define PREEMPT_BITS	8
 #define SOFTIRQ_BITS	8
 #define HARDIRQ_BITS	4
-#define NMI_BITS	1
+#define NMI_BITS	4
 
 #define PREEMPT_SHIFT	0
 #define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -296,12 +296,14 @@ static __printf(1, 0) int vprintk_nmi(co
 
 void notrace printk_nmi_enter(void)
 {
-	this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
+	if (!in_nmi())
+		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
 }
 
 void notrace printk_nmi_exit(void)
 {
-	this_cpu_and(printk_context, ~PRINTK_NMI_CONTEXT_MASK);
+	if (!in_nmi())
+		this_cpu_and(printk_context, ~PRINTK_NMI_CONTEXT_MASK);
 }
 
 /*



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
  2020-02-19 14:47 ` [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter() Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 17:13   ` Borislav Petkov
  2020-02-19 14:47 ` [PATCH v3 03/22] x86: Replace ist_enter() with nmi_enter() Peter Zijlstra
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat

It is an abomination; and in prepration of removing the whole
ist_enter() thing, it needs to go.

Convert #MC over to using task_work_add() instead; it will run the
same code slightly later, on the return to user path of the same
exception.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/traps.h   |    2 -
 arch/x86/kernel/cpu/mce/core.c |   53 +++++++++++++++++++++++------------------
 arch/x86/kernel/traps.c        |   37 ----------------------------
 include/linux/sched.h          |    6 ++++
 4 files changed, 36 insertions(+), 62 deletions(-)

--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -123,8 +123,6 @@ asmlinkage void smp_irq_move_cleanup_int
 
 extern void ist_enter(struct pt_regs *regs);
 extern void ist_exit(struct pt_regs *regs);
-extern void ist_begin_non_atomic(struct pt_regs *regs);
-extern void ist_end_non_atomic(void);
 
 #ifdef CONFIG_VMAP_STACK
 void __noreturn handle_stack_overflow(const char *message,
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -42,6 +42,7 @@
 #include <linux/export.h>
 #include <linux/jump_label.h>
 #include <linux/set_memory.h>
+#include <linux/task_work.h>
 
 #include <asm/intel-family.h>
 #include <asm/processor.h>
@@ -1084,23 +1085,6 @@ static void mce_clear_state(unsigned lon
 	}
 }
 
-static int do_memory_failure(struct mce *m)
-{
-	int flags = MF_ACTION_REQUIRED;
-	int ret;
-
-	pr_err("Uncorrected hardware memory error in user-access at %llx", m->addr);
-	if (!(m->mcgstatus & MCG_STATUS_RIPV))
-		flags |= MF_MUST_KILL;
-	ret = memory_failure(m->addr >> PAGE_SHIFT, flags);
-	if (ret)
-		pr_err("Memory error not recovered");
-	else
-		set_mce_nospec(m->addr >> PAGE_SHIFT);
-	return ret;
-}
-
-
 /*
  * Cases where we avoid rendezvous handler timeout:
  * 1) If this CPU is offline.
@@ -1202,6 +1186,29 @@ static void __mc_scan_banks(struct mce *
 	*m = *final;
 }
 
+static void mce_kill_me_now(struct callback_head *ch)
+{
+	force_sig(SIGBUS);
+}
+
+static void mce_kill_me_maybe(struct callback_head *cb)
+{
+	struct task_struct *p = container_of(cb, struct task_struct, mce_kill_me);
+	int flags = MF_ACTION_REQUIRED;
+
+	pr_err("Uncorrected hardware memory error in user-access at %llx", p->mce_addr);
+	if (!(p->mce_status & MCG_STATUS_RIPV))
+		flags |= MF_MUST_KILL;
+
+	if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags)) {
+		set_mce_nospec(p->mce_addr >> PAGE_SHIFT);
+		return;
+	}
+
+	pr_err("Memory error not recovered");
+	mce_kill_me_now(cb);
+}
+
 /*
  * The actual machine check handler. This only handles real
  * exceptions when something got corrupted coming in through int 18.
@@ -1344,13 +1351,13 @@ void do_machine_check(struct pt_regs *re
 
 	/* Fault was in user mode and we need to take some action */
 	if ((m.cs & 3) == 3) {
-		ist_begin_non_atomic(regs);
-		local_irq_enable();
+		current->mce_addr = m.addr;
+		current->mce_status = m.mcgstatus;
+		current->mce_kill_me.func = mce_kill_me_maybe;
+		if (kill_it)
+			current->mce_kill_me.func = mce_kill_me_now;
 
-		if (kill_it || do_memory_failure(&m))
-			force_sig(SIGBUS);
-		local_irq_disable();
-		ist_end_non_atomic();
+		task_work_add(current, &current->mce_kill_me, true);
 	} else {
 		if (!fixup_exception(regs, X86_TRAP_MC, error_code, 0))
 			mce_panic("Failed kernel mode recovery", &m, msg);
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -117,43 +117,6 @@ void ist_exit(struct pt_regs *regs)
 		rcu_nmi_exit();
 }
 
-/**
- * ist_begin_non_atomic() - begin a non-atomic section in an IST exception
- * @regs:	regs passed to the IST exception handler
- *
- * IST exception handlers normally cannot schedule.  As a special
- * exception, if the exception interrupted userspace code (i.e.
- * user_mode(regs) would return true) and the exception was not
- * a double fault, it can be safe to schedule.  ist_begin_non_atomic()
- * begins a non-atomic section within an ist_enter()/ist_exit() region.
- * Callers are responsible for enabling interrupts themselves inside
- * the non-atomic section, and callers must call ist_end_non_atomic()
- * before ist_exit().
- */
-void ist_begin_non_atomic(struct pt_regs *regs)
-{
-	BUG_ON(!user_mode(regs));
-
-	/*
-	 * Sanity check: we need to be on the normal thread stack.  This
-	 * will catch asm bugs and any attempt to use ist_preempt_enable
-	 * from double_fault.
-	 */
-	BUG_ON(!on_thread_stack());
-
-	preempt_enable_no_resched();
-}
-
-/**
- * ist_end_non_atomic() - begin a non-atomic section in an IST exception
- *
- * Ends a non-atomic section started with ist_begin_non_atomic().
- */
-void ist_end_non_atomic(void)
-{
-	preempt_disable();
-}
-
 int is_valid_bugaddr(unsigned long addr)
 {
 	unsigned short ud;
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1285,6 +1285,12 @@ struct task_struct {
 	unsigned long			prev_lowest_stack;
 #endif
 
+#ifdef CONFIG_X86_MCE
+	u64				mce_addr;
+	u64				mce_status;
+	struct callback_head		mce_kill_me;
+#endif
+
 	/*
 	 * New fields for task_struct should be added above here, so that
 	 * they are included in the randomized portion of task_struct.



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 03/22] x86: Replace ist_enter() with nmi_enter()
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
  2020-02-19 14:47 ` [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter() Peter Zijlstra
  2020-02-19 14:47 ` [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic() Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-20 10:54   ` Borislav Petkov
  2020-02-19 14:47 ` [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE Peter Zijlstra
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat

A few exceptions (like #DB and #BP) can happen at any location in the
code, this then means that tracers should treat events from these
exceptions as NMI-like. We could be holding locks with interrupts
disabled for instance.

Similarly, #MC is an actual NMI-like exception.

All of them use ist_enter() which only concerns itself with RCU, but
does not do any of the other setup that NMI's need. This means things
like:

	printk()
	  raw_spin_lock(&logbuf_lock);
	  <#DB/#BP/#MC>
	     printk()
	       raw_spin_lock(&logbuf_lock);

are entirely possible.

So replace ist_enter() with nmi_enter(). Also observe that any
nmi_enter() caller must be both notrace and NOKPROBE.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/traps.h      |    3 --
 arch/x86/kernel/cpu/mce/core.c    |   15 +++++-----
 arch/x86/kernel/cpu/mce/p5.c      |    7 ++--
 arch/x86/kernel/cpu/mce/winchip.c |    7 ++--
 arch/x86/kernel/traps.c           |   57 +++++---------------------------------
 5 files changed, 24 insertions(+), 65 deletions(-)

--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -121,9 +121,6 @@ void smp_spurious_interrupt(struct pt_re
 void smp_error_interrupt(struct pt_regs *regs);
 asmlinkage void smp_irq_move_cleanup_interrupt(void);
 
-extern void ist_enter(struct pt_regs *regs);
-extern void ist_exit(struct pt_regs *regs);
-
 #ifdef CONFIG_VMAP_STACK
 void __noreturn handle_stack_overflow(const char *message,
 				      struct pt_regs *regs,
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1220,7 +1220,7 @@ static void mce_kill_me_maybe(struct cal
  * MCE broadcast. However some CPUs might be broken beyond repair,
  * so be always careful when synchronizing with others.
  */
-void do_machine_check(struct pt_regs *regs, long error_code)
+notrace void do_machine_check(struct pt_regs *regs, long error_code)
 {
 	DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
 	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
@@ -1254,10 +1254,10 @@ void do_machine_check(struct pt_regs *re
 	 */
 	int lmce = 1;
 
-	if (__mc_check_crashing_cpu(cpu))
-		return;
+	nmi_enter();
 
-	ist_enter(regs);
+	if (__mc_check_crashing_cpu(cpu))
+		goto out;
 
 	this_cpu_inc(mce_exception_count);
 
@@ -1346,7 +1346,7 @@ void do_machine_check(struct pt_regs *re
 	sync_core();
 
 	if (worst != MCE_AR_SEVERITY && !kill_it)
-		goto out_ist;
+		goto out;
 
 	/* Fault was in user mode and we need to take some action */
 	if ((m.cs & 3) == 3) {
@@ -1362,10 +1362,11 @@ void do_machine_check(struct pt_regs *re
 			mce_panic("Failed kernel mode recovery", &m, msg);
 	}
 
-out_ist:
-	ist_exit(regs);
+out:
+	nmi_exit();
 }
 EXPORT_SYMBOL_GPL(do_machine_check);
+NOKPROBE_SYMBOL(do_machine_check);
 
 #ifndef CONFIG_MEMORY_FAILURE
 int memory_failure(unsigned long pfn, int flags)
--- a/arch/x86/kernel/cpu/mce/p5.c
+++ b/arch/x86/kernel/cpu/mce/p5.c
@@ -20,11 +20,11 @@
 int mce_p5_enabled __read_mostly;
 
 /* Machine check handler for Pentium class Intel CPUs: */
-static void pentium_machine_check(struct pt_regs *regs, long error_code)
+static notrace void pentium_machine_check(struct pt_regs *regs, long error_code)
 {
 	u32 loaddr, hi, lotype;
 
-	ist_enter(regs);
+	nmi_enter();
 
 	rdmsr(MSR_IA32_P5_MC_ADDR, loaddr, hi);
 	rdmsr(MSR_IA32_P5_MC_TYPE, lotype, hi);
@@ -39,8 +39,9 @@ static void pentium_machine_check(struct
 
 	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
 
-	ist_exit(regs);
+	nmi_exit();
 }
+NOKPROBE_SYMBOL(pentium_machine_check);
 
 /* Set up machine check reporting for processors with Intel style MCE: */
 void intel_p5_mcheck_init(struct cpuinfo_x86 *c)
--- a/arch/x86/kernel/cpu/mce/winchip.c
+++ b/arch/x86/kernel/cpu/mce/winchip.c
@@ -16,15 +16,16 @@
 #include "internal.h"
 
 /* Machine check handler for WinChip C6: */
-static void winchip_machine_check(struct pt_regs *regs, long error_code)
+static notrace void winchip_machine_check(struct pt_regs *regs, long error_code)
 {
-	ist_enter(regs);
+	nmi_enter();
 
 	pr_emerg("CPU0: Machine Check Exception.\n");
 	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
 
-	ist_exit(regs);
+	nmi_exit();
 }
+NOKPROBE_SYMBOL(winchip_machine_check);
 
 /* Set up machine check reporting on the Winchip C6 series */
 void winchip_mcheck_init(struct cpuinfo_x86 *c)
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -81,41 +81,6 @@ static inline void cond_local_irq_disabl
 		local_irq_disable();
 }
 
-/*
- * In IST context, we explicitly disable preemption.  This serves two
- * purposes: it makes it much less likely that we would accidentally
- * schedule in IST context and it will force a warning if we somehow
- * manage to schedule by accident.
- */
-void ist_enter(struct pt_regs *regs)
-{
-	if (user_mode(regs)) {
-		RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
-	} else {
-		/*
-		 * We might have interrupted pretty much anything.  In
-		 * fact, if we're a machine check, we can even interrupt
-		 * NMI processing.  We don't want in_nmi() to return true,
-		 * but we need to notify RCU.
-		 */
-		rcu_nmi_enter();
-	}
-
-	preempt_disable();
-
-	/* This code is a bit fragile.  Test it. */
-	RCU_LOCKDEP_WARN(!rcu_is_watching(), "ist_enter didn't work");
-}
-NOKPROBE_SYMBOL(ist_enter);
-
-void ist_exit(struct pt_regs *regs)
-{
-	preempt_enable_no_resched();
-
-	if (!user_mode(regs))
-		rcu_nmi_exit();
-}
-
 int is_valid_bugaddr(unsigned long addr)
 {
 	unsigned short ud;
@@ -306,7 +271,7 @@ dotraplinkage void do_double_fault(struc
 	 * The net result is that our #GP handler will think that we
 	 * entered from usermode with the bad user context.
 	 *
-	 * No need for ist_enter here because we don't use RCU.
+	 * No need for nmi_enter() here because we don't use RCU.
 	 */
 	if (((long)regs->sp >> P4D_SHIFT) == ESPFIX_PGD_ENTRY &&
 		regs->cs == __KERNEL_CS &&
@@ -341,7 +306,7 @@ dotraplinkage void do_double_fault(struc
 	}
 #endif
 
-	ist_enter(regs);
+	nmi_enter();
 	notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);
 
 	tsk->thread.error_code = error_code;
@@ -393,6 +358,7 @@ dotraplinkage void do_double_fault(struc
 	die("double fault", regs, error_code);
 	panic("Machine halted.");
 }
+NOKPROBE_SYMBOL(do_double_fault)
 #endif
 
 dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
@@ -534,14 +500,7 @@ dotraplinkage void notrace do_int3(struc
 	if (poke_int3_handler(regs))
 		return;
 
-	/*
-	 * Use ist_enter despite the fact that we don't use an IST stack.
-	 * We can be called from a kprobe in non-CONTEXT_KERNEL kernel
-	 * mode or even during context tracking state changes.
-	 *
-	 * This means that we can't schedule.  That's okay.
-	 */
-	ist_enter(regs);
+	nmi_enter();
 	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
 #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
 	if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
@@ -563,7 +522,7 @@ dotraplinkage void notrace do_int3(struc
 	cond_local_irq_disable(regs);
 
 exit:
-	ist_exit(regs);
+	nmi_exit();
 }
 NOKPROBE_SYMBOL(do_int3);
 
@@ -660,14 +619,14 @@ static bool is_sysenter_singlestep(struc
  *
  * May run on IST stack.
  */
-dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
+dotraplinkage notrace void do_debug(struct pt_regs *regs, long error_code)
 {
 	struct task_struct *tsk = current;
 	int user_icebp = 0;
 	unsigned long dr6;
 	int si_code;
 
-	ist_enter(regs);
+	nmi_enter();
 
 	get_debugreg(dr6, 6);
 	/*
@@ -760,7 +719,7 @@ dotraplinkage void do_debug(struct pt_re
 	debug_stack_usage_dec();
 
 exit:
-	ist_exit(regs);
+	nmi_exit();
 }
 NOKPROBE_SYMBOL(do_debug);
 



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (2 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 03/22] x86: Replace ist_enter() with nmi_enter() Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 15:36   ` Steven Rostedt
  2020-02-19 15:47   ` Steven Rostedt
  2020-02-19 14:47 ` [PATCH v3 05/22] rcu: Make RCU IRQ enter/exit functions rely on in_nmi() Peter Zijlstra
                   ` (17 subsequent siblings)
  21 siblings, 2 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat

Hitting the tracer or a kprobes from #DF is 'interesting', lets avoid
that.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/traps.c   |    3 ++-
 arch/x86/lib/memcpy_32.c  |    7 ++++++-
 arch/x86/lib/memmove_64.S |    5 +++++
 3 files changed, 13 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -271,7 +271,8 @@ dotraplinkage void do_double_fault(struc
 	 * The net result is that our #GP handler will think that we
 	 * entered from usermode with the bad user context.
 	 *
-	 * No need for nmi_enter() here because we don't use RCU.
+	 * No need for nmi_enter() here because we don't call out to anything
+	 * except memmove() and that is notrace/NOKPROBE.
 	 */
 	if (((long)regs->sp >> P4D_SHIFT) == ESPFIX_PGD_ENTRY &&
 		regs->cs == __KERNEL_CS &&
--- a/arch/x86/lib/memcpy_32.c
+++ b/arch/x86/lib/memcpy_32.c
@@ -21,7 +21,7 @@ __visible void *memset(void *s, int c, s
 }
 EXPORT_SYMBOL(memset);
 
-__visible void *memmove(void *dest, const void *src, size_t n)
+__visible notrace void *memmove(void *dest, const void *src, size_t n)
 {
 	int d0,d1,d2,d3,d4,d5;
 	char *ret = dest;
@@ -207,3 +207,8 @@ __visible void *memmove(void *dest, cons
 
 }
 EXPORT_SYMBOL(memmove);
+/*
+ * The double fault handler uses memmove(), do not mess with it or risk a
+ * tripple fault.
+ */
+NOKPROBE_SYMBOL(memmove);
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -212,3 +212,8 @@ SYM_FUNC_END(__memmove)
 SYM_FUNC_END_ALIAS(memmove)
 EXPORT_SYMBOL(__memmove)
 EXPORT_SYMBOL(memmove)
+/*
+ * The double fault handler uses memmove(), do not mess with it or risk a
+ * tripple fault.
+ */
+_ASM_NOKPROBE(__memmove)



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 05/22] rcu: Make RCU IRQ enter/exit functions rely on in_nmi()
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (3 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 16:31   ` Paul E. McKenney
  2020-02-19 14:47 ` [PATCH v3 06/22] rcu: Rename rcu_irq_{enter,exit}_irqson() Peter Zijlstra
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat

From: Paul E. McKenney <paulmck@kernel.org>

The rcu_nmi_enter_common() and rcu_nmi_exit_common() functions take an
"irq" parameter that indicates whether these functions are invoked from
an irq handler (irq==true) or an NMI handler (irq==false).  However,
recent changes have applied notrace to a few critical functions such
that rcu_nmi_enter_common() and rcu_nmi_exit_common() many now rely
on in_nmi().  Note that in_nmi() works no differently than before,
but rather that tracing is now prohibited in code regions where in_nmi()
would incorrectly report NMI state.

This commit therefore removes the "irq" parameter and inlines
rcu_nmi_enter_common() and rcu_nmi_exit_common() into rcu_nmi_enter()
and rcu_nmi_exit(), respectively.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/rcu/tree.c |   45 ++++++++++++++-------------------------------
 1 file changed, 14 insertions(+), 31 deletions(-)

--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -614,16 +614,18 @@ void rcu_user_enter(void)
 }
 #endif /* CONFIG_NO_HZ_FULL */
 
-/*
+/**
+ * rcu_nmi_exit - inform RCU of exit from NMI context
+ *
  * If we are returning from the outermost NMI handler that interrupted an
  * RCU-idle period, update rdp->dynticks and rdp->dynticks_nmi_nesting
  * to let the RCU grace-period handling know that the CPU is back to
  * being RCU-idle.
  *
- * If you add or remove a call to rcu_nmi_exit_common(), be sure to test
+ * If you add or remove a call to rcu_nmi_exit(), be sure to test
  * with CONFIG_RCU_EQS_DEBUG=y.
  */
-static __always_inline void rcu_nmi_exit_common(bool irq)
+void rcu_nmi_exit(void)
 {
 	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 
@@ -651,27 +653,16 @@ static __always_inline void rcu_nmi_exit
 	trace_rcu_dyntick(TPS("Startirq"), rdp->dynticks_nmi_nesting, 0, atomic_read(&rdp->dynticks));
 	WRITE_ONCE(rdp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */
 
-	if (irq)
+	if (!in_nmi())
 		rcu_prepare_for_idle();
 
 	rcu_dynticks_eqs_enter();
 
-	if (irq)
+	if (!in_nmi())
 		rcu_dynticks_task_enter();
 }
 
 /**
- * rcu_nmi_exit - inform RCU of exit from NMI context
- *
- * If you add or remove a call to rcu_nmi_exit(), be sure to test
- * with CONFIG_RCU_EQS_DEBUG=y.
- */
-void rcu_nmi_exit(void)
-{
-	rcu_nmi_exit_common(false);
-}
-
-/**
  * rcu_irq_exit - inform RCU that current CPU is exiting irq towards idle
  *
  * Exit from an interrupt handler, which might possibly result in entering
@@ -693,7 +684,7 @@ void rcu_nmi_exit(void)
 void rcu_irq_exit(void)
 {
 	lockdep_assert_irqs_disabled();
-	rcu_nmi_exit_common(true);
+	rcu_nmi_exit();
 }
 
 /*
@@ -777,7 +768,7 @@ void rcu_user_exit(void)
 #endif /* CONFIG_NO_HZ_FULL */
 
 /**
- * rcu_nmi_enter_common - inform RCU of entry to NMI context
+ * rcu_nmi_enter - inform RCU of entry to NMI context
  * @irq: Is this call from rcu_irq_enter?
  *
  * If the CPU was idle from RCU's viewpoint, update rdp->dynticks and
@@ -786,10 +777,10 @@ void rcu_user_exit(void)
  * long as the nesting level does not overflow an int.  (You will probably
  * run out of stack space first.)
  *
- * If you add or remove a call to rcu_nmi_enter_common(), be sure to test
+ * If you add or remove a call to rcu_nmi_enter(), be sure to test
  * with CONFIG_RCU_EQS_DEBUG=y.
  */
-static __always_inline void rcu_nmi_enter_common(bool irq)
+void rcu_nmi_enter(void)
 {
 	long incby = 2;
 	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
@@ -807,12 +798,12 @@ static __always_inline void rcu_nmi_ente
 	 */
 	if (rcu_dynticks_curr_cpu_in_eqs()) {
 
-		if (irq)
+		if (!in_nmi())
 			rcu_dynticks_task_exit();
 
 		rcu_dynticks_eqs_exit();
 
-		if (irq)
+		if (!in_nmi())
 			rcu_cleanup_after_idle();
 
 		incby = 1;
@@ -834,14 +825,6 @@ static __always_inline void rcu_nmi_ente
 		   rdp->dynticks_nmi_nesting + incby);
 	barrier();
 }
-
-/**
- * rcu_nmi_enter - inform RCU of entry to NMI context
- */
-void rcu_nmi_enter(void)
-{
-	rcu_nmi_enter_common(false);
-}
 NOKPROBE_SYMBOL(rcu_nmi_enter);
 
 /**
@@ -869,7 +852,7 @@ NOKPROBE_SYMBOL(rcu_nmi_enter);
 void rcu_irq_enter(void)
 {
 	lockdep_assert_irqs_disabled();
-	rcu_nmi_enter_common(true);
+	rcu_nmi_enter();
 }
 
 /*



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 06/22] rcu: Rename rcu_irq_{enter,exit}_irqson()
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (4 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 05/22] rcu: Make RCU IRQ enter/exit functions rely on in_nmi() Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 16:38   ` Paul E. McKenney
  2020-02-19 14:47 ` [PATCH v3 07/22] rcu: Mark rcu_dynticks_curr_cpu_in_eqs() inline Peter Zijlstra
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat

The functions do in fact use local_irq_{save,restore}() and can
therefore be used when IRQs are in fact disabled. Worse, they are
already used in places where IRQs are disabled, leading to great
confusion when reading the code.

Rename them to fix this confusion.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/rcupdate.h   |    4 ++--
 include/linux/rcutiny.h    |    4 ++--
 include/linux/rcutree.h    |    4 ++--
 include/linux/tracepoint.h |    4 ++--
 kernel/cpu_pm.c            |    4 ++--
 kernel/rcu/tree.c          |    8 ++++----
 kernel/trace/trace.c       |    4 ++--
 7 files changed, 16 insertions(+), 16 deletions(-)

--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -120,9 +120,9 @@ static inline void rcu_init_nohz(void) {
  */
 #define RCU_NONIDLE(a) \
 	do { \
-		rcu_irq_enter_irqson(); \
+		rcu_irq_enter_irqsave(); \
 		do { a; } while (0); \
-		rcu_irq_exit_irqson(); \
+		rcu_irq_exit_irqsave(); \
 	} while (0)
 
 /*
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -68,8 +68,8 @@ static inline int rcu_jiffies_till_stall
 static inline void rcu_idle_enter(void) { }
 static inline void rcu_idle_exit(void) { }
 static inline void rcu_irq_enter(void) { }
-static inline void rcu_irq_exit_irqson(void) { }
-static inline void rcu_irq_enter_irqson(void) { }
+static inline void rcu_irq_exit_irqsave(void) { }
+static inline void rcu_irq_enter_irqsave(void) { }
 static inline void rcu_irq_exit(void) { }
 static inline void exit_rcu(void) { }
 static inline bool rcu_preempt_need_deferred_qs(struct task_struct *t)
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -46,8 +46,8 @@ void rcu_idle_enter(void);
 void rcu_idle_exit(void);
 void rcu_irq_enter(void);
 void rcu_irq_exit(void);
-void rcu_irq_enter_irqson(void);
-void rcu_irq_exit_irqson(void);
+void rcu_irq_enter_irqsave(void);
+void rcu_irq_exit_irqsave(void);
 
 void exit_rcu(void);
 
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -181,7 +181,7 @@ static inline struct tracepoint *tracepo
 		 */							\
 		if (rcuidle) {						\
 			__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
-			rcu_irq_enter_irqson();				\
+			rcu_irq_enter_irqsave();			\
 		}							\
 									\
 		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
@@ -195,7 +195,7 @@ static inline struct tracepoint *tracepo
 		}							\
 									\
 		if (rcuidle) {						\
-			rcu_irq_exit_irqson();				\
+			rcu_irq_exit_irqsave();				\
 			srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\
 		}							\
 									\
--- a/kernel/cpu_pm.c
+++ b/kernel/cpu_pm.c
@@ -24,10 +24,10 @@ static int cpu_pm_notify(enum cpu_pm_eve
 	 * could be disfunctional in cpu idle. Copy RCU_NONIDLE code to let
 	 * RCU know this.
 	 */
-	rcu_irq_enter_irqson();
+	rcu_irq_enter_irqsave();
 	ret = __atomic_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL,
 		nr_to_call, nr_calls);
-	rcu_irq_exit_irqson();
+	rcu_irq_exit_irqsave();
 
 	return notifier_to_errno(ret);
 }
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -699,10 +699,10 @@ void rcu_irq_exit(void)
 /*
  * Wrapper for rcu_irq_exit() where interrupts are enabled.
  *
- * If you add or remove a call to rcu_irq_exit_irqson(), be sure to test
+ * If you add or remove a call to rcu_irq_exit_irqsave(), be sure to test
  * with CONFIG_RCU_EQS_DEBUG=y.
  */
-void rcu_irq_exit_irqson(void)
+void rcu_irq_exit_irqsave(void)
 {
 	unsigned long flags;
 
@@ -875,10 +875,10 @@ void rcu_irq_enter(void)
 /*
  * Wrapper for rcu_irq_enter() where interrupts are enabled.
  *
- * If you add or remove a call to rcu_irq_enter_irqson(), be sure to test
+ * If you add or remove a call to rcu_irq_enter_irqsave(), be sure to test
  * with CONFIG_RCU_EQS_DEBUG=y.
  */
-void rcu_irq_enter_irqson(void)
+void rcu_irq_enter_irqsave(void)
 {
 	unsigned long flags;
 
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3004,9 +3004,9 @@ void __trace_stack(struct trace_array *t
 	if (unlikely(in_nmi()))
 		return;
 
-	rcu_irq_enter_irqson();
+	rcu_irq_enter_irqsave();
 	__ftrace_trace_stack(buffer, flags, skip, pc, NULL);
-	rcu_irq_exit_irqson();
+	rcu_irq_exit_irqsave();
 }
 
 /**



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 07/22] rcu: Mark rcu_dynticks_curr_cpu_in_eqs() inline
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (5 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 06/22] rcu: Rename rcu_irq_{enter,exit}_irqson() Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 16:39   ` Paul E. McKenney
  2020-02-19 14:47 ` [PATCH v3 08/22] rcu,tracing: Create trace_rcu_{enter,exit}() Peter Zijlstra
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat

Since rcu_is_watching() is notrace (and needs to be, as it can be
called from the tracers), make sure everything it in turn calls is
notrace too.

To that effect, mark rcu_dynticks_curr_cpu_in_eqs() inline, which
implies notrace, as the function is tiny.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/rcu/tree.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -294,7 +294,7 @@ static void rcu_dynticks_eqs_online(void
  *
  * No ordering, as we are sampling CPU-local information.
  */
-static bool rcu_dynticks_curr_cpu_in_eqs(void)
+static inline bool rcu_dynticks_curr_cpu_in_eqs(void)
 {
 	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 08/22] rcu,tracing: Create trace_rcu_{enter,exit}()
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (6 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 07/22] rcu: Mark rcu_dynticks_curr_cpu_in_eqs() inline Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 15:49   ` Steven Rostedt
  2020-02-19 14:47 ` [PATCH v3 09/22] sched,rcu,tracing: Avoid tracing before in_nmi() is correct Peter Zijlstra
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat

To facilitate tracers that need RCU, add some helpers to wrap the
magic required.

The problem is that we can call into tracers (trace events and
function tracing) while RCU isn't watching and this can happen from
any context, including NMI.

It is this latter that is causing most of the trouble; we must make
sure in_nmi() returns true before we land in anything tracing,
otherwise we cannot recover.

These helpers are macros because of header-hell; they're placed here
because of the proximity to nmi_{enter,exit{().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/hardirq.h |   32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -89,4 +89,36 @@ extern void irq_exit(void);
 		arch_nmi_exit();				\
 	} while (0)
 
+/*
+ * Tracing vs RCU
+ * --------------
+ *
+ * tracepoints and function-tracing can happen when RCU isn't watching (idle,
+ * or early IRQ/NMI entry).
+ *
+ * When it happens during idle or early during IRQ entry, tracing will have
+ * to inform RCU that it ought to pay attention, this is done by calling
+ * rcu_irq_enter_irqsave().
+ *
+ * On NMI entry, we must be very careful that tracing only happens after we've
+ * incremented preempt_count(), otherwise we cannot tell we're in NMI and take
+ * the special path.
+ */
+
+#define trace_rcu_enter()					\
+({								\
+	unsigned long state = 0;				\
+	if (!rcu_is_watching())	{				\
+		rcu_irq_enter_irqsave();			\
+		state = 1;					\
+	}							\
+	state;							\
+})
+
+#define trace_rcu_exit(state)					\
+do {								\
+	if (state)						\
+		rcu_irq_exit_irqsave();				\
+} while (0)
+
 #endif /* LINUX_HARDIRQ_H */



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 09/22] sched,rcu,tracing: Avoid tracing before in_nmi() is correct
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (7 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 08/22] rcu,tracing: Create trace_rcu_{enter,exit}() Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 15:50   ` Steven Rostedt
  2020-02-19 15:50   ` Steven Rostedt
  2020-02-19 14:47 ` [PATCH v3 10/22] x86,tracing: Add comments to do_nmi() Peter Zijlstra
                   ` (12 subsequent siblings)
  21 siblings, 2 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat, Steven Rostedt (VMware)

If we call into a tracer before in_nmi() becomes true, the tracer can
no longer detect it is called from NMI context and behave correctly.

Therefore change nmi_{enter,exit}() to use __preempt_count_{add,sub}()
as the normal preempt_count_{add,sub}() have a (desired) function
trace entry.

This fixes a potential issue with current code; AFAICT when the
function-tracer has stack-tracing enabled __trace_stack() will
malfunction when it hits the preempt_count_add() function entry from
NMI context.

Suggested-by: Steven Rostedt (VMware) <rosted@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/hardirq.h |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -65,6 +65,15 @@ extern void irq_exit(void);
 #define arch_nmi_exit()		do { } while (0)
 #endif
 
+/*
+ * NMI vs Tracing
+ * --------------
+ *
+ * We must not land in a tracer until (or after) we've changed preempt_count
+ * such that in_nmi() becomes true. To that effect all NMI C entry points must
+ * be marked 'notrace' and call nmi_enter() as soon as possible.
+ */
+
 #define nmi_enter()						\
 	do {							\
 		arch_nmi_enter();				\
@@ -72,7 +81,7 @@ extern void irq_exit(void);
 		lockdep_off();					\
 		ftrace_nmi_enter();				\
 		BUG_ON(in_nmi() == NMI_MASK);			\
-		preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
+		__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
 		rcu_nmi_enter();				\
 		trace_hardirq_enter();				\
 	} while (0)
@@ -82,7 +91,7 @@ extern void irq_exit(void);
 		trace_hardirq_exit();				\
 		rcu_nmi_exit();					\
 		BUG_ON(!in_nmi());				\
-		preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);	\
+		__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
 		ftrace_nmi_exit();				\
 		lockdep_on();					\
 		printk_nmi_exit();				\



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 10/22] x86,tracing: Add comments to do_nmi()
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (8 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 09/22] sched,rcu,tracing: Avoid tracing before in_nmi() is correct Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 15:51   ` Steven Rostedt
  2020-02-19 14:47 ` [PATCH v3 11/22] perf,tracing: Prepare the perf-trace interface for RCU changes Peter Zijlstra
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat

Add a few comments to do_nmi() as a result of the audit.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/nmi.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -529,11 +529,14 @@ do_nmi(struct pt_regs *regs, long error_
 	 * continue to use the NMI stack.
 	 */
 	if (unlikely(is_debug_stack(regs->sp))) {
-		debug_stack_set_zero();
+		debug_stack_set_zero(); /* notrace due to Makefile */
 		this_cpu_write(update_debug_stack, 1);
 	}
 #endif
 
+	/*
+	 * It is important that no tracing happens before nmi_enter()!
+	 */
 	nmi_enter();
 
 	inc_irq_stat(__nmi_count);
@@ -542,10 +545,13 @@ do_nmi(struct pt_regs *regs, long error_
 		default_do_nmi(regs);
 
 	nmi_exit();
+	/*
+	 * No tracing after nmi_exit()!
+	 */
 
 #ifdef CONFIG_X86_64
 	if (unlikely(this_cpu_read(update_debug_stack))) {
-		debug_stack_reset();
+		debug_stack_reset(); /* notrace due to Makefile */
 		this_cpu_write(update_debug_stack, 0);
 	}
 #endif



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 11/22] perf,tracing: Prepare the perf-trace interface for RCU changes
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (9 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 10/22] x86,tracing: Add comments to do_nmi() Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 14:47 ` [PATCH v3 12/22] tracing: Employ trace_rcu_{enter,exit}() Peter Zijlstra
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat, Steven Rostedt (VMware)

The tracepoint interface will stop providing regular RCU context; make
sure we do it ourselves, since perf makes use of regular RCU protected
data.

Suggested-by: Steven Rostedt (VMware) <rosted@goodmis.org>
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8950,6 +8950,7 @@ void perf_tp_event(u16 event_type, u64 c
 {
 	struct perf_sample_data data;
 	struct perf_event *event;
+	unsigned long rcu_flags;
 
 	struct perf_raw_record raw = {
 		.frag = {
@@ -8961,6 +8962,8 @@ void perf_tp_event(u16 event_type, u64 c
 	perf_sample_data_init(&data, 0, 0);
 	data.raw = &raw;
 
+	rcu_flags = trace_rcu_enter();
+
 	perf_trace_buf_update(record, event_type);
 
 	hlist_for_each_entry_rcu(event, head, hlist_entry) {
@@ -8996,6 +8999,8 @@ void perf_tp_event(u16 event_type, u64 c
 	}
 
 	perf_swevent_put_recursion_context(rctx);
+
+	trace_rcu_exit(rcu_flags);
 }
 EXPORT_SYMBOL_GPL(perf_tp_event);
 



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 12/22] tracing: Employ trace_rcu_{enter,exit}()
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (10 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 11/22] perf,tracing: Prepare the perf-trace interface for RCU changes Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 15:52   ` Steven Rostedt
  2020-02-19 14:47 ` [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again) Peter Zijlstra
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat

Replace the opencoded (and incomplete) RCU manipulations with the new
helpers to ensure a regular RCU context when calling into
__ftrace_trace_stack().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/trace/trace.c |   19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)

--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2989,24 +2989,11 @@ void __trace_stack(struct trace_array *t
 		   int pc)
 {
 	struct trace_buffer *buffer = tr->array_buffer.buffer;
+	unsigned long rcu_flags;
 
-	if (rcu_is_watching()) {
-		__ftrace_trace_stack(buffer, flags, skip, pc, NULL);
-		return;
-	}
-
-	/*
-	 * When an NMI triggers, RCU is enabled via rcu_nmi_enter(),
-	 * but if the above rcu_is_watching() failed, then the NMI
-	 * triggered someplace critical, and rcu_irq_enter() should
-	 * not be called from NMI.
-	 */
-	if (unlikely(in_nmi()))
-		return;
-
-	rcu_irq_enter_irqsave();
+	rcu_flags = trace_rcu_enter();
 	__ftrace_trace_stack(buffer, flags, skip, pc, NULL);
-	rcu_irq_exit_irqsave();
+	trace_rcu_exit(rcu_flags);
 }
 
 /**



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again)
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (11 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 12/22] tracing: Employ trace_rcu_{enter,exit}() Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 15:53   ` Steven Rostedt
  2020-02-19 16:43   ` Paul E. McKenney
  2020-02-19 14:47 ` [PATCH v3 14/22] perf,tracing: Allow function tracing when !RCU Peter Zijlstra
                   ` (8 subsequent siblings)
  21 siblings, 2 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat

Effectively revert commit 865e63b04e9b2 ("tracing: Add back in
rcu_irq_enter/exit_irqson() for rcuidle tracepoints") now that we've
taught perf how to deal with not having an RCU context provided.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/tracepoint.h |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -179,10 +179,8 @@ static inline struct tracepoint *tracepo
 		 * For rcuidle callers, use srcu since sched-rcu	\
 		 * doesn't work from the idle path.			\
 		 */							\
-		if (rcuidle) {						\
+		if (rcuidle)						\
 			__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
-			rcu_irq_enter_irqsave();			\
-		}							\
 									\
 		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
 									\
@@ -194,10 +192,8 @@ static inline struct tracepoint *tracepo
 			} while ((++it_func_ptr)->func);		\
 		}							\
 									\
-		if (rcuidle) {						\
-			rcu_irq_exit_irqsave();				\
+		if (rcuidle)						\
 			srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\
-		}							\
 									\
 		preempt_enable_notrace();				\
 	} while (0)



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 14/22] perf,tracing: Allow function tracing when !RCU
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (12 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again) Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 14:47 ` [PATCH v3 15/22] x86/int3: Ensure that poke_int3_handler() is not traced Peter Zijlstra
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat

Since perf is now able to deal with !rcu_is_watching() contexts,
remove the restraint.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/trace/trace_event_perf.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -477,7 +477,7 @@ static int perf_ftrace_function_register
 {
 	struct ftrace_ops *ops = &event->ftrace_ops;
 
-	ops->flags   = FTRACE_OPS_FL_RCU;
+	ops->flags   = 0;
 	ops->func    = perf_ftrace_function_call;
 	ops->private = (void *)(unsigned long)nr_cpu_ids;
 



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 15/22] x86/int3: Ensure that poke_int3_handler() is not traced
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (13 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 14/22] perf,tracing: Allow function tracing when !RCU Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 14:47 ` [PATCH v3 16/22] locking/atomics, kcsan: Add KCSAN instrumentation Peter Zijlstra
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat

From: Thomas Gleixner <tglx@linutronix.de>

In order to ensure poke_int3_handler() is completely self contained --
we call this while we're modifying other text, imagine the fun of
hitting another INT3 -- ensure that everything it uses is not traced.

The primary means here is to force inlining; bsearch() is notrace
because all of lib/ is.

Not-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/ptrace.h        |    2 +-
 arch/x86/include/asm/text-patching.h |   11 +++++++----
 arch/x86/kernel/alternative.c        |   11 +++++++----
 3 files changed, 15 insertions(+), 9 deletions(-)

Index: linux-2.6/arch/x86/include/asm/ptrace.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/ptrace.h
+++ linux-2.6/arch/x86/include/asm/ptrace.h
@@ -123,7 +123,7 @@ static inline void regs_set_return_value
  * On x86_64, vm86 mode is mercifully nonexistent, and we don't need
  * the extra check.
  */
-static inline int user_mode(struct pt_regs *regs)
+static __always_inline int user_mode(struct pt_regs *regs)
 {
 #ifdef CONFIG_X86_32
 	return ((regs->cs & SEGMENT_RPL_MASK) | (regs->flags & X86_VM_MASK)) >= USER_RPL;
Index: linux-2.6/arch/x86/include/asm/text-patching.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/text-patching.h
+++ linux-2.6/arch/x86/include/asm/text-patching.h
@@ -64,7 +64,7 @@ extern void text_poke_finish(void);
 
 #define DISP32_SIZE		4
 
-static inline int text_opcode_size(u8 opcode)
+static __always_inline int text_opcode_size(u8 opcode)
 {
 	int size = 0;
 
@@ -118,12 +118,14 @@ extern __ro_after_init struct mm_struct
 extern __ro_after_init unsigned long poking_addr;
 
 #ifndef CONFIG_UML_X86
-static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip)
+static __always_inline
+void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip)
 {
 	regs->ip = ip;
 }
 
-static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val)
+static __always_inline
+void int3_emulate_push(struct pt_regs *regs, unsigned long val)
 {
 	/*
 	 * The int3 handler in entry_64.S adds a gap between the
@@ -138,7 +140,8 @@ static inline void int3_emulate_push(str
 	*(unsigned long *)regs->sp = val;
 }
 
-static inline void int3_emulate_call(struct pt_regs *regs, unsigned long func)
+static __always_inline
+void int3_emulate_call(struct pt_regs *regs, unsigned long func)
 {
 	int3_emulate_push(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE);
 	int3_emulate_jmp(regs, func);
Index: linux-2.6/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/alternative.c
+++ linux-2.6/arch/x86/kernel/alternative.c
@@ -956,7 +956,8 @@ struct bp_patching_desc {
 
 static struct bp_patching_desc *bp_desc;
 
-static inline struct bp_patching_desc *try_get_desc(struct bp_patching_desc **descp)
+static __always_inline
+struct bp_patching_desc *try_get_desc(struct bp_patching_desc **descp)
 {
 	struct bp_patching_desc *desc = READ_ONCE(*descp); /* rcu_dereference */
 
@@ -966,13 +967,13 @@ static inline struct bp_patching_desc *t
 	return desc;
 }
 
-static inline void put_desc(struct bp_patching_desc *desc)
+static __always_inline void put_desc(struct bp_patching_desc *desc)
 {
 	smp_mb__before_atomic();
 	atomic_dec(&desc->refs);
 }
 
-static inline void *text_poke_addr(struct text_poke_loc *tp)
+static __always_inline void *text_poke_addr(struct text_poke_loc *tp)
 {
 	return _stext + tp->rel_addr;
 }



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 16/22] locking/atomics, kcsan: Add KCSAN instrumentation
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (14 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 15/22] x86/int3: Ensure that poke_int3_handler() is not traced Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 15:46   ` Steven Rostedt
  2020-02-19 14:47 ` [PATCH v3 17/22] asm-generic/atomic: Use __always_inline for pure wrappers Peter Zijlstra
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat, Marco Elver, Mark Rutland

From: Marco Elver <elver@google.com>

This adds KCSAN instrumentation to atomic-instrumented.h.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
[peterz: removed the actual kcsan hooks]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
---
 include/asm-generic/atomic-instrumented.h |  390 +++++++++++++++---------------
 scripts/atomic/gen-atomic-instrumented.sh |   14 -
 2 files changed, 212 insertions(+), 192 deletions(-)

--- a/include/asm-generic/atomic-instrumented.h
+++ b/include/asm-generic/atomic-instrumented.h
@@ -20,10 +20,20 @@
 #include <linux/build_bug.h>
 #include <linux/kasan-checks.h>
 
+static inline void __atomic_check_read(const volatile void *v, size_t size)
+{
+	kasan_check_read(v, size);
+}
+
+static inline void __atomic_check_write(const volatile void *v, size_t size)
+{
+	kasan_check_write(v, size);
+}
+
 static inline int
 atomic_read(const atomic_t *v)
 {
-	kasan_check_read(v, sizeof(*v));
+	__atomic_check_read(v, sizeof(*v));
 	return arch_atomic_read(v);
 }
 #define atomic_read atomic_read
@@ -32,7 +42,7 @@ atomic_read(const atomic_t *v)
 static inline int
 atomic_read_acquire(const atomic_t *v)
 {
-	kasan_check_read(v, sizeof(*v));
+	__atomic_check_read(v, sizeof(*v));
 	return arch_atomic_read_acquire(v);
 }
 #define atomic_read_acquire atomic_read_acquire
@@ -41,7 +51,7 @@ atomic_read_acquire(const atomic_t *v)
 static inline void
 atomic_set(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_set(v, i);
 }
 #define atomic_set atomic_set
@@ -50,7 +60,7 @@ atomic_set(atomic_t *v, int i)
 static inline void
 atomic_set_release(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_set_release(v, i);
 }
 #define atomic_set_release atomic_set_release
@@ -59,7 +69,7 @@ atomic_set_release(atomic_t *v, int i)
 static inline void
 atomic_add(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_add(i, v);
 }
 #define atomic_add atomic_add
@@ -68,7 +78,7 @@ atomic_add(int i, atomic_t *v)
 static inline int
 atomic_add_return(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_return(i, v);
 }
 #define atomic_add_return atomic_add_return
@@ -78,7 +88,7 @@ atomic_add_return(int i, atomic_t *v)
 static inline int
 atomic_add_return_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_return_acquire(i, v);
 }
 #define atomic_add_return_acquire atomic_add_return_acquire
@@ -88,7 +98,7 @@ atomic_add_return_acquire(int i, atomic_
 static inline int
 atomic_add_return_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_return_release(i, v);
 }
 #define atomic_add_return_release atomic_add_return_release
@@ -98,7 +108,7 @@ atomic_add_return_release(int i, atomic_
 static inline int
 atomic_add_return_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_return_relaxed(i, v);
 }
 #define atomic_add_return_relaxed atomic_add_return_relaxed
@@ -108,7 +118,7 @@ atomic_add_return_relaxed(int i, atomic_
 static inline int
 atomic_fetch_add(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_add(i, v);
 }
 #define atomic_fetch_add atomic_fetch_add
@@ -118,7 +128,7 @@ atomic_fetch_add(int i, atomic_t *v)
 static inline int
 atomic_fetch_add_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_add_acquire(i, v);
 }
 #define atomic_fetch_add_acquire atomic_fetch_add_acquire
@@ -128,7 +138,7 @@ atomic_fetch_add_acquire(int i, atomic_t
 static inline int
 atomic_fetch_add_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_add_release(i, v);
 }
 #define atomic_fetch_add_release atomic_fetch_add_release
@@ -138,7 +148,7 @@ atomic_fetch_add_release(int i, atomic_t
 static inline int
 atomic_fetch_add_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_add_relaxed(i, v);
 }
 #define atomic_fetch_add_relaxed atomic_fetch_add_relaxed
@@ -147,7 +157,7 @@ atomic_fetch_add_relaxed(int i, atomic_t
 static inline void
 atomic_sub(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_sub(i, v);
 }
 #define atomic_sub atomic_sub
@@ -156,7 +166,7 @@ atomic_sub(int i, atomic_t *v)
 static inline int
 atomic_sub_return(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_sub_return(i, v);
 }
 #define atomic_sub_return atomic_sub_return
@@ -166,7 +176,7 @@ atomic_sub_return(int i, atomic_t *v)
 static inline int
 atomic_sub_return_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_sub_return_acquire(i, v);
 }
 #define atomic_sub_return_acquire atomic_sub_return_acquire
@@ -176,7 +186,7 @@ atomic_sub_return_acquire(int i, atomic_
 static inline int
 atomic_sub_return_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_sub_return_release(i, v);
 }
 #define atomic_sub_return_release atomic_sub_return_release
@@ -186,7 +196,7 @@ atomic_sub_return_release(int i, atomic_
 static inline int
 atomic_sub_return_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_sub_return_relaxed(i, v);
 }
 #define atomic_sub_return_relaxed atomic_sub_return_relaxed
@@ -196,7 +206,7 @@ atomic_sub_return_relaxed(int i, atomic_
 static inline int
 atomic_fetch_sub(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_sub(i, v);
 }
 #define atomic_fetch_sub atomic_fetch_sub
@@ -206,7 +216,7 @@ atomic_fetch_sub(int i, atomic_t *v)
 static inline int
 atomic_fetch_sub_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_sub_acquire(i, v);
 }
 #define atomic_fetch_sub_acquire atomic_fetch_sub_acquire
@@ -216,7 +226,7 @@ atomic_fetch_sub_acquire(int i, atomic_t
 static inline int
 atomic_fetch_sub_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_sub_release(i, v);
 }
 #define atomic_fetch_sub_release atomic_fetch_sub_release
@@ -226,7 +236,7 @@ atomic_fetch_sub_release(int i, atomic_t
 static inline int
 atomic_fetch_sub_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_sub_relaxed(i, v);
 }
 #define atomic_fetch_sub_relaxed atomic_fetch_sub_relaxed
@@ -236,7 +246,7 @@ atomic_fetch_sub_relaxed(int i, atomic_t
 static inline void
 atomic_inc(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_inc(v);
 }
 #define atomic_inc atomic_inc
@@ -246,7 +256,7 @@ atomic_inc(atomic_t *v)
 static inline int
 atomic_inc_return(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_return(v);
 }
 #define atomic_inc_return atomic_inc_return
@@ -256,7 +266,7 @@ atomic_inc_return(atomic_t *v)
 static inline int
 atomic_inc_return_acquire(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_return_acquire(v);
 }
 #define atomic_inc_return_acquire atomic_inc_return_acquire
@@ -266,7 +276,7 @@ atomic_inc_return_acquire(atomic_t *v)
 static inline int
 atomic_inc_return_release(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_return_release(v);
 }
 #define atomic_inc_return_release atomic_inc_return_release
@@ -276,7 +286,7 @@ atomic_inc_return_release(atomic_t *v)
 static inline int
 atomic_inc_return_relaxed(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_return_relaxed(v);
 }
 #define atomic_inc_return_relaxed atomic_inc_return_relaxed
@@ -286,7 +296,7 @@ atomic_inc_return_relaxed(atomic_t *v)
 static inline int
 atomic_fetch_inc(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_inc(v);
 }
 #define atomic_fetch_inc atomic_fetch_inc
@@ -296,7 +306,7 @@ atomic_fetch_inc(atomic_t *v)
 static inline int
 atomic_fetch_inc_acquire(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_inc_acquire(v);
 }
 #define atomic_fetch_inc_acquire atomic_fetch_inc_acquire
@@ -306,7 +316,7 @@ atomic_fetch_inc_acquire(atomic_t *v)
 static inline int
 atomic_fetch_inc_release(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_inc_release(v);
 }
 #define atomic_fetch_inc_release atomic_fetch_inc_release
@@ -316,7 +326,7 @@ atomic_fetch_inc_release(atomic_t *v)
 static inline int
 atomic_fetch_inc_relaxed(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_inc_relaxed(v);
 }
 #define atomic_fetch_inc_relaxed atomic_fetch_inc_relaxed
@@ -326,7 +336,7 @@ atomic_fetch_inc_relaxed(atomic_t *v)
 static inline void
 atomic_dec(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_dec(v);
 }
 #define atomic_dec atomic_dec
@@ -336,7 +346,7 @@ atomic_dec(atomic_t *v)
 static inline int
 atomic_dec_return(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_return(v);
 }
 #define atomic_dec_return atomic_dec_return
@@ -346,7 +356,7 @@ atomic_dec_return(atomic_t *v)
 static inline int
 atomic_dec_return_acquire(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_return_acquire(v);
 }
 #define atomic_dec_return_acquire atomic_dec_return_acquire
@@ -356,7 +366,7 @@ atomic_dec_return_acquire(atomic_t *v)
 static inline int
 atomic_dec_return_release(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_return_release(v);
 }
 #define atomic_dec_return_release atomic_dec_return_release
@@ -366,7 +376,7 @@ atomic_dec_return_release(atomic_t *v)
 static inline int
 atomic_dec_return_relaxed(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_return_relaxed(v);
 }
 #define atomic_dec_return_relaxed atomic_dec_return_relaxed
@@ -376,7 +386,7 @@ atomic_dec_return_relaxed(atomic_t *v)
 static inline int
 atomic_fetch_dec(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_dec(v);
 }
 #define atomic_fetch_dec atomic_fetch_dec
@@ -386,7 +396,7 @@ atomic_fetch_dec(atomic_t *v)
 static inline int
 atomic_fetch_dec_acquire(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_dec_acquire(v);
 }
 #define atomic_fetch_dec_acquire atomic_fetch_dec_acquire
@@ -396,7 +406,7 @@ atomic_fetch_dec_acquire(atomic_t *v)
 static inline int
 atomic_fetch_dec_release(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_dec_release(v);
 }
 #define atomic_fetch_dec_release atomic_fetch_dec_release
@@ -406,7 +416,7 @@ atomic_fetch_dec_release(atomic_t *v)
 static inline int
 atomic_fetch_dec_relaxed(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_dec_relaxed(v);
 }
 #define atomic_fetch_dec_relaxed atomic_fetch_dec_relaxed
@@ -415,7 +425,7 @@ atomic_fetch_dec_relaxed(atomic_t *v)
 static inline void
 atomic_and(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_and(i, v);
 }
 #define atomic_and atomic_and
@@ -424,7 +434,7 @@ atomic_and(int i, atomic_t *v)
 static inline int
 atomic_fetch_and(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_and(i, v);
 }
 #define atomic_fetch_and atomic_fetch_and
@@ -434,7 +444,7 @@ atomic_fetch_and(int i, atomic_t *v)
 static inline int
 atomic_fetch_and_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_and_acquire(i, v);
 }
 #define atomic_fetch_and_acquire atomic_fetch_and_acquire
@@ -444,7 +454,7 @@ atomic_fetch_and_acquire(int i, atomic_t
 static inline int
 atomic_fetch_and_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_and_release(i, v);
 }
 #define atomic_fetch_and_release atomic_fetch_and_release
@@ -454,7 +464,7 @@ atomic_fetch_and_release(int i, atomic_t
 static inline int
 atomic_fetch_and_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_and_relaxed(i, v);
 }
 #define atomic_fetch_and_relaxed atomic_fetch_and_relaxed
@@ -464,7 +474,7 @@ atomic_fetch_and_relaxed(int i, atomic_t
 static inline void
 atomic_andnot(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_andnot(i, v);
 }
 #define atomic_andnot atomic_andnot
@@ -474,7 +484,7 @@ atomic_andnot(int i, atomic_t *v)
 static inline int
 atomic_fetch_andnot(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_andnot(i, v);
 }
 #define atomic_fetch_andnot atomic_fetch_andnot
@@ -484,7 +494,7 @@ atomic_fetch_andnot(int i, atomic_t *v)
 static inline int
 atomic_fetch_andnot_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_andnot_acquire(i, v);
 }
 #define atomic_fetch_andnot_acquire atomic_fetch_andnot_acquire
@@ -494,7 +504,7 @@ atomic_fetch_andnot_acquire(int i, atomi
 static inline int
 atomic_fetch_andnot_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_andnot_release(i, v);
 }
 #define atomic_fetch_andnot_release atomic_fetch_andnot_release
@@ -504,7 +514,7 @@ atomic_fetch_andnot_release(int i, atomi
 static inline int
 atomic_fetch_andnot_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_andnot_relaxed(i, v);
 }
 #define atomic_fetch_andnot_relaxed atomic_fetch_andnot_relaxed
@@ -513,7 +523,7 @@ atomic_fetch_andnot_relaxed(int i, atomi
 static inline void
 atomic_or(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_or(i, v);
 }
 #define atomic_or atomic_or
@@ -522,7 +532,7 @@ atomic_or(int i, atomic_t *v)
 static inline int
 atomic_fetch_or(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_or(i, v);
 }
 #define atomic_fetch_or atomic_fetch_or
@@ -532,7 +542,7 @@ atomic_fetch_or(int i, atomic_t *v)
 static inline int
 atomic_fetch_or_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_or_acquire(i, v);
 }
 #define atomic_fetch_or_acquire atomic_fetch_or_acquire
@@ -542,7 +552,7 @@ atomic_fetch_or_acquire(int i, atomic_t
 static inline int
 atomic_fetch_or_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_or_release(i, v);
 }
 #define atomic_fetch_or_release atomic_fetch_or_release
@@ -552,7 +562,7 @@ atomic_fetch_or_release(int i, atomic_t
 static inline int
 atomic_fetch_or_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_or_relaxed(i, v);
 }
 #define atomic_fetch_or_relaxed atomic_fetch_or_relaxed
@@ -561,7 +571,7 @@ atomic_fetch_or_relaxed(int i, atomic_t
 static inline void
 atomic_xor(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic_xor(i, v);
 }
 #define atomic_xor atomic_xor
@@ -570,7 +580,7 @@ atomic_xor(int i, atomic_t *v)
 static inline int
 atomic_fetch_xor(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_xor(i, v);
 }
 #define atomic_fetch_xor atomic_fetch_xor
@@ -580,7 +590,7 @@ atomic_fetch_xor(int i, atomic_t *v)
 static inline int
 atomic_fetch_xor_acquire(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_xor_acquire(i, v);
 }
 #define atomic_fetch_xor_acquire atomic_fetch_xor_acquire
@@ -590,7 +600,7 @@ atomic_fetch_xor_acquire(int i, atomic_t
 static inline int
 atomic_fetch_xor_release(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_xor_release(i, v);
 }
 #define atomic_fetch_xor_release atomic_fetch_xor_release
@@ -600,7 +610,7 @@ atomic_fetch_xor_release(int i, atomic_t
 static inline int
 atomic_fetch_xor_relaxed(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_xor_relaxed(i, v);
 }
 #define atomic_fetch_xor_relaxed atomic_fetch_xor_relaxed
@@ -610,7 +620,7 @@ atomic_fetch_xor_relaxed(int i, atomic_t
 static inline int
 atomic_xchg(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_xchg(v, i);
 }
 #define atomic_xchg atomic_xchg
@@ -620,7 +630,7 @@ atomic_xchg(atomic_t *v, int i)
 static inline int
 atomic_xchg_acquire(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_xchg_acquire(v, i);
 }
 #define atomic_xchg_acquire atomic_xchg_acquire
@@ -630,7 +640,7 @@ atomic_xchg_acquire(atomic_t *v, int i)
 static inline int
 atomic_xchg_release(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_xchg_release(v, i);
 }
 #define atomic_xchg_release atomic_xchg_release
@@ -640,7 +650,7 @@ atomic_xchg_release(atomic_t *v, int i)
 static inline int
 atomic_xchg_relaxed(atomic_t *v, int i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_xchg_relaxed(v, i);
 }
 #define atomic_xchg_relaxed atomic_xchg_relaxed
@@ -650,7 +660,7 @@ atomic_xchg_relaxed(atomic_t *v, int i)
 static inline int
 atomic_cmpxchg(atomic_t *v, int old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_cmpxchg(v, old, new);
 }
 #define atomic_cmpxchg atomic_cmpxchg
@@ -660,7 +670,7 @@ atomic_cmpxchg(atomic_t *v, int old, int
 static inline int
 atomic_cmpxchg_acquire(atomic_t *v, int old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_cmpxchg_acquire(v, old, new);
 }
 #define atomic_cmpxchg_acquire atomic_cmpxchg_acquire
@@ -670,7 +680,7 @@ atomic_cmpxchg_acquire(atomic_t *v, int
 static inline int
 atomic_cmpxchg_release(atomic_t *v, int old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_cmpxchg_release(v, old, new);
 }
 #define atomic_cmpxchg_release atomic_cmpxchg_release
@@ -680,7 +690,7 @@ atomic_cmpxchg_release(atomic_t *v, int
 static inline int
 atomic_cmpxchg_relaxed(atomic_t *v, int old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_cmpxchg_relaxed(v, old, new);
 }
 #define atomic_cmpxchg_relaxed atomic_cmpxchg_relaxed
@@ -690,8 +700,8 @@ atomic_cmpxchg_relaxed(atomic_t *v, int
 static inline bool
 atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg(v, old, new);
 }
 #define atomic_try_cmpxchg atomic_try_cmpxchg
@@ -701,8 +711,8 @@ atomic_try_cmpxchg(atomic_t *v, int *old
 static inline bool
 atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg_acquire(v, old, new);
 }
 #define atomic_try_cmpxchg_acquire atomic_try_cmpxchg_acquire
@@ -712,8 +722,8 @@ atomic_try_cmpxchg_acquire(atomic_t *v,
 static inline bool
 atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg_release(v, old, new);
 }
 #define atomic_try_cmpxchg_release atomic_try_cmpxchg_release
@@ -723,8 +733,8 @@ atomic_try_cmpxchg_release(atomic_t *v,
 static inline bool
 atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic_try_cmpxchg_relaxed(v, old, new);
 }
 #define atomic_try_cmpxchg_relaxed atomic_try_cmpxchg_relaxed
@@ -734,7 +744,7 @@ atomic_try_cmpxchg_relaxed(atomic_t *v,
 static inline bool
 atomic_sub_and_test(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_sub_and_test(i, v);
 }
 #define atomic_sub_and_test atomic_sub_and_test
@@ -744,7 +754,7 @@ atomic_sub_and_test(int i, atomic_t *v)
 static inline bool
 atomic_dec_and_test(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_and_test(v);
 }
 #define atomic_dec_and_test atomic_dec_and_test
@@ -754,7 +764,7 @@ atomic_dec_and_test(atomic_t *v)
 static inline bool
 atomic_inc_and_test(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_and_test(v);
 }
 #define atomic_inc_and_test atomic_inc_and_test
@@ -764,7 +774,7 @@ atomic_inc_and_test(atomic_t *v)
 static inline bool
 atomic_add_negative(int i, atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_negative(i, v);
 }
 #define atomic_add_negative atomic_add_negative
@@ -774,7 +784,7 @@ atomic_add_negative(int i, atomic_t *v)
 static inline int
 atomic_fetch_add_unless(atomic_t *v, int a, int u)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_fetch_add_unless(v, a, u);
 }
 #define atomic_fetch_add_unless atomic_fetch_add_unless
@@ -784,7 +794,7 @@ atomic_fetch_add_unless(atomic_t *v, int
 static inline bool
 atomic_add_unless(atomic_t *v, int a, int u)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_add_unless(v, a, u);
 }
 #define atomic_add_unless atomic_add_unless
@@ -794,7 +804,7 @@ atomic_add_unless(atomic_t *v, int a, in
 static inline bool
 atomic_inc_not_zero(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_not_zero(v);
 }
 #define atomic_inc_not_zero atomic_inc_not_zero
@@ -804,7 +814,7 @@ atomic_inc_not_zero(atomic_t *v)
 static inline bool
 atomic_inc_unless_negative(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_inc_unless_negative(v);
 }
 #define atomic_inc_unless_negative atomic_inc_unless_negative
@@ -814,7 +824,7 @@ atomic_inc_unless_negative(atomic_t *v)
 static inline bool
 atomic_dec_unless_positive(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_unless_positive(v);
 }
 #define atomic_dec_unless_positive atomic_dec_unless_positive
@@ -824,7 +834,7 @@ atomic_dec_unless_positive(atomic_t *v)
 static inline int
 atomic_dec_if_positive(atomic_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic_dec_if_positive(v);
 }
 #define atomic_dec_if_positive atomic_dec_if_positive
@@ -833,7 +843,7 @@ atomic_dec_if_positive(atomic_t *v)
 static inline s64
 atomic64_read(const atomic64_t *v)
 {
-	kasan_check_read(v, sizeof(*v));
+	__atomic_check_read(v, sizeof(*v));
 	return arch_atomic64_read(v);
 }
 #define atomic64_read atomic64_read
@@ -842,7 +852,7 @@ atomic64_read(const atomic64_t *v)
 static inline s64
 atomic64_read_acquire(const atomic64_t *v)
 {
-	kasan_check_read(v, sizeof(*v));
+	__atomic_check_read(v, sizeof(*v));
 	return arch_atomic64_read_acquire(v);
 }
 #define atomic64_read_acquire atomic64_read_acquire
@@ -851,7 +861,7 @@ atomic64_read_acquire(const atomic64_t *
 static inline void
 atomic64_set(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_set(v, i);
 }
 #define atomic64_set atomic64_set
@@ -860,7 +870,7 @@ atomic64_set(atomic64_t *v, s64 i)
 static inline void
 atomic64_set_release(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_set_release(v, i);
 }
 #define atomic64_set_release atomic64_set_release
@@ -869,7 +879,7 @@ atomic64_set_release(atomic64_t *v, s64
 static inline void
 atomic64_add(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_add(i, v);
 }
 #define atomic64_add atomic64_add
@@ -878,7 +888,7 @@ atomic64_add(s64 i, atomic64_t *v)
 static inline s64
 atomic64_add_return(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_return(i, v);
 }
 #define atomic64_add_return atomic64_add_return
@@ -888,7 +898,7 @@ atomic64_add_return(s64 i, atomic64_t *v
 static inline s64
 atomic64_add_return_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_return_acquire(i, v);
 }
 #define atomic64_add_return_acquire atomic64_add_return_acquire
@@ -898,7 +908,7 @@ atomic64_add_return_acquire(s64 i, atomi
 static inline s64
 atomic64_add_return_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_return_release(i, v);
 }
 #define atomic64_add_return_release atomic64_add_return_release
@@ -908,7 +918,7 @@ atomic64_add_return_release(s64 i, atomi
 static inline s64
 atomic64_add_return_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_return_relaxed(i, v);
 }
 #define atomic64_add_return_relaxed atomic64_add_return_relaxed
@@ -918,7 +928,7 @@ atomic64_add_return_relaxed(s64 i, atomi
 static inline s64
 atomic64_fetch_add(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add(i, v);
 }
 #define atomic64_fetch_add atomic64_fetch_add
@@ -928,7 +938,7 @@ atomic64_fetch_add(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_add_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add_acquire(i, v);
 }
 #define atomic64_fetch_add_acquire atomic64_fetch_add_acquire
@@ -938,7 +948,7 @@ atomic64_fetch_add_acquire(s64 i, atomic
 static inline s64
 atomic64_fetch_add_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add_release(i, v);
 }
 #define atomic64_fetch_add_release atomic64_fetch_add_release
@@ -948,7 +958,7 @@ atomic64_fetch_add_release(s64 i, atomic
 static inline s64
 atomic64_fetch_add_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add_relaxed(i, v);
 }
 #define atomic64_fetch_add_relaxed atomic64_fetch_add_relaxed
@@ -957,7 +967,7 @@ atomic64_fetch_add_relaxed(s64 i, atomic
 static inline void
 atomic64_sub(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_sub(i, v);
 }
 #define atomic64_sub atomic64_sub
@@ -966,7 +976,7 @@ atomic64_sub(s64 i, atomic64_t *v)
 static inline s64
 atomic64_sub_return(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_sub_return(i, v);
 }
 #define atomic64_sub_return atomic64_sub_return
@@ -976,7 +986,7 @@ atomic64_sub_return(s64 i, atomic64_t *v
 static inline s64
 atomic64_sub_return_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_sub_return_acquire(i, v);
 }
 #define atomic64_sub_return_acquire atomic64_sub_return_acquire
@@ -986,7 +996,7 @@ atomic64_sub_return_acquire(s64 i, atomi
 static inline s64
 atomic64_sub_return_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_sub_return_release(i, v);
 }
 #define atomic64_sub_return_release atomic64_sub_return_release
@@ -996,7 +1006,7 @@ atomic64_sub_return_release(s64 i, atomi
 static inline s64
 atomic64_sub_return_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_sub_return_relaxed(i, v);
 }
 #define atomic64_sub_return_relaxed atomic64_sub_return_relaxed
@@ -1006,7 +1016,7 @@ atomic64_sub_return_relaxed(s64 i, atomi
 static inline s64
 atomic64_fetch_sub(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_sub(i, v);
 }
 #define atomic64_fetch_sub atomic64_fetch_sub
@@ -1016,7 +1026,7 @@ atomic64_fetch_sub(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_sub_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_sub_acquire(i, v);
 }
 #define atomic64_fetch_sub_acquire atomic64_fetch_sub_acquire
@@ -1026,7 +1036,7 @@ atomic64_fetch_sub_acquire(s64 i, atomic
 static inline s64
 atomic64_fetch_sub_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_sub_release(i, v);
 }
 #define atomic64_fetch_sub_release atomic64_fetch_sub_release
@@ -1036,7 +1046,7 @@ atomic64_fetch_sub_release(s64 i, atomic
 static inline s64
 atomic64_fetch_sub_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_sub_relaxed(i, v);
 }
 #define atomic64_fetch_sub_relaxed atomic64_fetch_sub_relaxed
@@ -1046,7 +1056,7 @@ atomic64_fetch_sub_relaxed(s64 i, atomic
 static inline void
 atomic64_inc(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_inc(v);
 }
 #define atomic64_inc atomic64_inc
@@ -1056,7 +1066,7 @@ atomic64_inc(atomic64_t *v)
 static inline s64
 atomic64_inc_return(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_return(v);
 }
 #define atomic64_inc_return atomic64_inc_return
@@ -1066,7 +1076,7 @@ atomic64_inc_return(atomic64_t *v)
 static inline s64
 atomic64_inc_return_acquire(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_return_acquire(v);
 }
 #define atomic64_inc_return_acquire atomic64_inc_return_acquire
@@ -1076,7 +1086,7 @@ atomic64_inc_return_acquire(atomic64_t *
 static inline s64
 atomic64_inc_return_release(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_return_release(v);
 }
 #define atomic64_inc_return_release atomic64_inc_return_release
@@ -1086,7 +1096,7 @@ atomic64_inc_return_release(atomic64_t *
 static inline s64
 atomic64_inc_return_relaxed(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_return_relaxed(v);
 }
 #define atomic64_inc_return_relaxed atomic64_inc_return_relaxed
@@ -1096,7 +1106,7 @@ atomic64_inc_return_relaxed(atomic64_t *
 static inline s64
 atomic64_fetch_inc(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_inc(v);
 }
 #define atomic64_fetch_inc atomic64_fetch_inc
@@ -1106,7 +1116,7 @@ atomic64_fetch_inc(atomic64_t *v)
 static inline s64
 atomic64_fetch_inc_acquire(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_inc_acquire(v);
 }
 #define atomic64_fetch_inc_acquire atomic64_fetch_inc_acquire
@@ -1116,7 +1126,7 @@ atomic64_fetch_inc_acquire(atomic64_t *v
 static inline s64
 atomic64_fetch_inc_release(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_inc_release(v);
 }
 #define atomic64_fetch_inc_release atomic64_fetch_inc_release
@@ -1126,7 +1136,7 @@ atomic64_fetch_inc_release(atomic64_t *v
 static inline s64
 atomic64_fetch_inc_relaxed(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_inc_relaxed(v);
 }
 #define atomic64_fetch_inc_relaxed atomic64_fetch_inc_relaxed
@@ -1136,7 +1146,7 @@ atomic64_fetch_inc_relaxed(atomic64_t *v
 static inline void
 atomic64_dec(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_dec(v);
 }
 #define atomic64_dec atomic64_dec
@@ -1146,7 +1156,7 @@ atomic64_dec(atomic64_t *v)
 static inline s64
 atomic64_dec_return(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_return(v);
 }
 #define atomic64_dec_return atomic64_dec_return
@@ -1156,7 +1166,7 @@ atomic64_dec_return(atomic64_t *v)
 static inline s64
 atomic64_dec_return_acquire(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_return_acquire(v);
 }
 #define atomic64_dec_return_acquire atomic64_dec_return_acquire
@@ -1166,7 +1176,7 @@ atomic64_dec_return_acquire(atomic64_t *
 static inline s64
 atomic64_dec_return_release(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_return_release(v);
 }
 #define atomic64_dec_return_release atomic64_dec_return_release
@@ -1176,7 +1186,7 @@ atomic64_dec_return_release(atomic64_t *
 static inline s64
 atomic64_dec_return_relaxed(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_return_relaxed(v);
 }
 #define atomic64_dec_return_relaxed atomic64_dec_return_relaxed
@@ -1186,7 +1196,7 @@ atomic64_dec_return_relaxed(atomic64_t *
 static inline s64
 atomic64_fetch_dec(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_dec(v);
 }
 #define atomic64_fetch_dec atomic64_fetch_dec
@@ -1196,7 +1206,7 @@ atomic64_fetch_dec(atomic64_t *v)
 static inline s64
 atomic64_fetch_dec_acquire(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_dec_acquire(v);
 }
 #define atomic64_fetch_dec_acquire atomic64_fetch_dec_acquire
@@ -1206,7 +1216,7 @@ atomic64_fetch_dec_acquire(atomic64_t *v
 static inline s64
 atomic64_fetch_dec_release(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_dec_release(v);
 }
 #define atomic64_fetch_dec_release atomic64_fetch_dec_release
@@ -1216,7 +1226,7 @@ atomic64_fetch_dec_release(atomic64_t *v
 static inline s64
 atomic64_fetch_dec_relaxed(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_dec_relaxed(v);
 }
 #define atomic64_fetch_dec_relaxed atomic64_fetch_dec_relaxed
@@ -1225,7 +1235,7 @@ atomic64_fetch_dec_relaxed(atomic64_t *v
 static inline void
 atomic64_and(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_and(i, v);
 }
 #define atomic64_and atomic64_and
@@ -1234,7 +1244,7 @@ atomic64_and(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_and(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_and(i, v);
 }
 #define atomic64_fetch_and atomic64_fetch_and
@@ -1244,7 +1254,7 @@ atomic64_fetch_and(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_and_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_and_acquire(i, v);
 }
 #define atomic64_fetch_and_acquire atomic64_fetch_and_acquire
@@ -1254,7 +1264,7 @@ atomic64_fetch_and_acquire(s64 i, atomic
 static inline s64
 atomic64_fetch_and_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_and_release(i, v);
 }
 #define atomic64_fetch_and_release atomic64_fetch_and_release
@@ -1264,7 +1274,7 @@ atomic64_fetch_and_release(s64 i, atomic
 static inline s64
 atomic64_fetch_and_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_and_relaxed(i, v);
 }
 #define atomic64_fetch_and_relaxed atomic64_fetch_and_relaxed
@@ -1274,7 +1284,7 @@ atomic64_fetch_and_relaxed(s64 i, atomic
 static inline void
 atomic64_andnot(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_andnot(i, v);
 }
 #define atomic64_andnot atomic64_andnot
@@ -1284,7 +1294,7 @@ atomic64_andnot(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_andnot(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_andnot(i, v);
 }
 #define atomic64_fetch_andnot atomic64_fetch_andnot
@@ -1294,7 +1304,7 @@ atomic64_fetch_andnot(s64 i, atomic64_t
 static inline s64
 atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_andnot_acquire(i, v);
 }
 #define atomic64_fetch_andnot_acquire atomic64_fetch_andnot_acquire
@@ -1304,7 +1314,7 @@ atomic64_fetch_andnot_acquire(s64 i, ato
 static inline s64
 atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_andnot_release(i, v);
 }
 #define atomic64_fetch_andnot_release atomic64_fetch_andnot_release
@@ -1314,7 +1324,7 @@ atomic64_fetch_andnot_release(s64 i, ato
 static inline s64
 atomic64_fetch_andnot_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_andnot_relaxed(i, v);
 }
 #define atomic64_fetch_andnot_relaxed atomic64_fetch_andnot_relaxed
@@ -1323,7 +1333,7 @@ atomic64_fetch_andnot_relaxed(s64 i, ato
 static inline void
 atomic64_or(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_or(i, v);
 }
 #define atomic64_or atomic64_or
@@ -1332,7 +1342,7 @@ atomic64_or(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_or(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_or(i, v);
 }
 #define atomic64_fetch_or atomic64_fetch_or
@@ -1342,7 +1352,7 @@ atomic64_fetch_or(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_or_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_or_acquire(i, v);
 }
 #define atomic64_fetch_or_acquire atomic64_fetch_or_acquire
@@ -1352,7 +1362,7 @@ atomic64_fetch_or_acquire(s64 i, atomic6
 static inline s64
 atomic64_fetch_or_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_or_release(i, v);
 }
 #define atomic64_fetch_or_release atomic64_fetch_or_release
@@ -1362,7 +1372,7 @@ atomic64_fetch_or_release(s64 i, atomic6
 static inline s64
 atomic64_fetch_or_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_or_relaxed(i, v);
 }
 #define atomic64_fetch_or_relaxed atomic64_fetch_or_relaxed
@@ -1371,7 +1381,7 @@ atomic64_fetch_or_relaxed(s64 i, atomic6
 static inline void
 atomic64_xor(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	arch_atomic64_xor(i, v);
 }
 #define atomic64_xor atomic64_xor
@@ -1380,7 +1390,7 @@ atomic64_xor(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_xor(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_xor(i, v);
 }
 #define atomic64_fetch_xor atomic64_fetch_xor
@@ -1390,7 +1400,7 @@ atomic64_fetch_xor(s64 i, atomic64_t *v)
 static inline s64
 atomic64_fetch_xor_acquire(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_xor_acquire(i, v);
 }
 #define atomic64_fetch_xor_acquire atomic64_fetch_xor_acquire
@@ -1400,7 +1410,7 @@ atomic64_fetch_xor_acquire(s64 i, atomic
 static inline s64
 atomic64_fetch_xor_release(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_xor_release(i, v);
 }
 #define atomic64_fetch_xor_release atomic64_fetch_xor_release
@@ -1410,7 +1420,7 @@ atomic64_fetch_xor_release(s64 i, atomic
 static inline s64
 atomic64_fetch_xor_relaxed(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_xor_relaxed(i, v);
 }
 #define atomic64_fetch_xor_relaxed atomic64_fetch_xor_relaxed
@@ -1420,7 +1430,7 @@ atomic64_fetch_xor_relaxed(s64 i, atomic
 static inline s64
 atomic64_xchg(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_xchg(v, i);
 }
 #define atomic64_xchg atomic64_xchg
@@ -1430,7 +1440,7 @@ atomic64_xchg(atomic64_t *v, s64 i)
 static inline s64
 atomic64_xchg_acquire(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_xchg_acquire(v, i);
 }
 #define atomic64_xchg_acquire atomic64_xchg_acquire
@@ -1440,7 +1450,7 @@ atomic64_xchg_acquire(atomic64_t *v, s64
 static inline s64
 atomic64_xchg_release(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_xchg_release(v, i);
 }
 #define atomic64_xchg_release atomic64_xchg_release
@@ -1450,7 +1460,7 @@ atomic64_xchg_release(atomic64_t *v, s64
 static inline s64
 atomic64_xchg_relaxed(atomic64_t *v, s64 i)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_xchg_relaxed(v, i);
 }
 #define atomic64_xchg_relaxed atomic64_xchg_relaxed
@@ -1460,7 +1470,7 @@ atomic64_xchg_relaxed(atomic64_t *v, s64
 static inline s64
 atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_cmpxchg(v, old, new);
 }
 #define atomic64_cmpxchg atomic64_cmpxchg
@@ -1470,7 +1480,7 @@ atomic64_cmpxchg(atomic64_t *v, s64 old,
 static inline s64
 atomic64_cmpxchg_acquire(atomic64_t *v, s64 old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_cmpxchg_acquire(v, old, new);
 }
 #define atomic64_cmpxchg_acquire atomic64_cmpxchg_acquire
@@ -1480,7 +1490,7 @@ atomic64_cmpxchg_acquire(atomic64_t *v,
 static inline s64
 atomic64_cmpxchg_release(atomic64_t *v, s64 old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_cmpxchg_release(v, old, new);
 }
 #define atomic64_cmpxchg_release atomic64_cmpxchg_release
@@ -1490,7 +1500,7 @@ atomic64_cmpxchg_release(atomic64_t *v,
 static inline s64
 atomic64_cmpxchg_relaxed(atomic64_t *v, s64 old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_cmpxchg_relaxed(v, old, new);
 }
 #define atomic64_cmpxchg_relaxed atomic64_cmpxchg_relaxed
@@ -1500,8 +1510,8 @@ atomic64_cmpxchg_relaxed(atomic64_t *v,
 static inline bool
 atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg(v, old, new);
 }
 #define atomic64_try_cmpxchg atomic64_try_cmpxchg
@@ -1511,8 +1521,8 @@ atomic64_try_cmpxchg(atomic64_t *v, s64
 static inline bool
 atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg_acquire(v, old, new);
 }
 #define atomic64_try_cmpxchg_acquire atomic64_try_cmpxchg_acquire
@@ -1522,8 +1532,8 @@ atomic64_try_cmpxchg_acquire(atomic64_t
 static inline bool
 atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg_release(v, old, new);
 }
 #define atomic64_try_cmpxchg_release atomic64_try_cmpxchg_release
@@ -1533,8 +1543,8 @@ atomic64_try_cmpxchg_release(atomic64_t
 static inline bool
 atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
 {
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_write(old, sizeof(*old));
+	__atomic_check_write(v, sizeof(*v));
+	__atomic_check_write(old, sizeof(*old));
 	return arch_atomic64_try_cmpxchg_relaxed(v, old, new);
 }
 #define atomic64_try_cmpxchg_relaxed atomic64_try_cmpxchg_relaxed
@@ -1544,7 +1554,7 @@ atomic64_try_cmpxchg_relaxed(atomic64_t
 static inline bool
 atomic64_sub_and_test(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_sub_and_test(i, v);
 }
 #define atomic64_sub_and_test atomic64_sub_and_test
@@ -1554,7 +1564,7 @@ atomic64_sub_and_test(s64 i, atomic64_t
 static inline bool
 atomic64_dec_and_test(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_and_test(v);
 }
 #define atomic64_dec_and_test atomic64_dec_and_test
@@ -1564,7 +1574,7 @@ atomic64_dec_and_test(atomic64_t *v)
 static inline bool
 atomic64_inc_and_test(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_and_test(v);
 }
 #define atomic64_inc_and_test atomic64_inc_and_test
@@ -1574,7 +1584,7 @@ atomic64_inc_and_test(atomic64_t *v)
 static inline bool
 atomic64_add_negative(s64 i, atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_negative(i, v);
 }
 #define atomic64_add_negative atomic64_add_negative
@@ -1584,7 +1594,7 @@ atomic64_add_negative(s64 i, atomic64_t
 static inline s64
 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_fetch_add_unless(v, a, u);
 }
 #define atomic64_fetch_add_unless atomic64_fetch_add_unless
@@ -1594,7 +1604,7 @@ atomic64_fetch_add_unless(atomic64_t *v,
 static inline bool
 atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_add_unless(v, a, u);
 }
 #define atomic64_add_unless atomic64_add_unless
@@ -1604,7 +1614,7 @@ atomic64_add_unless(atomic64_t *v, s64 a
 static inline bool
 atomic64_inc_not_zero(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_not_zero(v);
 }
 #define atomic64_inc_not_zero atomic64_inc_not_zero
@@ -1614,7 +1624,7 @@ atomic64_inc_not_zero(atomic64_t *v)
 static inline bool
 atomic64_inc_unless_negative(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_inc_unless_negative(v);
 }
 #define atomic64_inc_unless_negative atomic64_inc_unless_negative
@@ -1624,7 +1634,7 @@ atomic64_inc_unless_negative(atomic64_t
 static inline bool
 atomic64_dec_unless_positive(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_unless_positive(v);
 }
 #define atomic64_dec_unless_positive atomic64_dec_unless_positive
@@ -1634,7 +1644,7 @@ atomic64_dec_unless_positive(atomic64_t
 static inline s64
 atomic64_dec_if_positive(atomic64_t *v)
 {
-	kasan_check_write(v, sizeof(*v));
+	__atomic_check_write(v, sizeof(*v));
 	return arch_atomic64_dec_if_positive(v);
 }
 #define atomic64_dec_if_positive atomic64_dec_if_positive
@@ -1644,7 +1654,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define xchg(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_xchg(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1653,7 +1663,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define xchg_acquire(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_xchg_acquire(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1662,7 +1672,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define xchg_release(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_xchg_release(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1671,7 +1681,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define xchg_relaxed(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_xchg_relaxed(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1680,7 +1690,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1689,7 +1699,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg_acquire(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg_acquire(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1698,7 +1708,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg_release(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg_release(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1707,7 +1717,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg_relaxed(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1716,7 +1726,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg64(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg64(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1725,7 +1735,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg64_acquire(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg64_acquire(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1734,7 +1744,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg64_release(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg64_release(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1743,7 +1753,7 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg64_relaxed(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg64_relaxed(__ai_ptr, __VA_ARGS__);				\
 })
 #endif
@@ -1751,28 +1761,28 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg_local(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg_local(__ai_ptr, __VA_ARGS__);				\
 })
 
 #define cmpxchg64_local(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_cmpxchg64_local(__ai_ptr, __VA_ARGS__);				\
 })
 
 #define sync_cmpxchg(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, sizeof(*__ai_ptr));		\
 	arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__);				\
 })
 
 #define cmpxchg_double(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, 2 * sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, 2 * sizeof(*__ai_ptr));		\
 	arch_cmpxchg_double(__ai_ptr, __VA_ARGS__);				\
 })
 
@@ -1780,9 +1790,9 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define cmpxchg_double_local(ptr, ...)						\
 ({									\
 	typeof(ptr) __ai_ptr = (ptr);					\
-	kasan_check_write(__ai_ptr, 2 * sizeof(*__ai_ptr));		\
+	__atomic_check_write(__ai_ptr, 2 * sizeof(*__ai_ptr));		\
 	arch_cmpxchg_double_local(__ai_ptr, __VA_ARGS__);				\
 })
 
 #endif /* _ASM_GENERIC_ATOMIC_INSTRUMENTED_H */
-// b29b625d5de9280f680e42c7be859b55b15e5f6a
+// aa929c117bdd954a0957b91fe509f118ca8b9707
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -20,7 +20,7 @@ gen_param_check()
 	# We don't write to constant parameters
 	[ ${type#c} != ${type} ] && rw="read"
 
-	printf "\tkasan_check_${rw}(${name}, sizeof(*${name}));\n"
+	printf "\t__atomic_check_${rw}(${name}, sizeof(*${name}));\n"
 }
 
 #gen_param_check(arg...)
@@ -107,7 +107,7 @@ cat <<EOF
 #define ${xchg}(ptr, ...)						\\
 ({									\\
 	typeof(ptr) __ai_ptr = (ptr);					\\
-	kasan_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr));		\\
+	__atomic_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr));		\\
 	arch_${xchg}(__ai_ptr, __VA_ARGS__);				\\
 })
 EOF
@@ -149,6 +149,16 @@ cat << EOF
 #include <linux/build_bug.h>
 #include <linux/kasan-checks.h>
 
+static inline void __atomic_check_read(const volatile void *v, size_t size)
+{
+	kasan_check_read(v, size);
+}
+
+static inline void __atomic_check_write(const volatile void *v, size_t size)
+{
+	kasan_check_write(v, size);
+}
+
 EOF
 
 grep '^[a-z]' "$1" | while read name meta args; do



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 17/22] asm-generic/atomic: Use __always_inline for pure wrappers
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (15 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 16/22] locking/atomics, kcsan: Add KCSAN instrumentation Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 14:47 ` [PATCH v3 18/22] asm-generic/atomic: Use __always_inline for fallback wrappers Peter Zijlstra
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat, Randy Dunlap, Marco Elver, Mark Rutland

From: Marco Elver <elver@google.com>

Prefer __always_inline for atomic wrappers. When building for size
(CC_OPTIMIZE_FOR_SIZE), some compilers appear to be less inclined to
inline even relatively small static inline functions that are assumed to
be inlinable such as atomic ops. This can cause problems, for example in
UACCESS regions.

By using __always_inline, we let the real implementation and not the
wrapper determine the final inlining preference.

For x86 tinyconfig we observe:
 - vmlinux baseline: 1316204
 - vmlinux with patch: 1315988 (-216 bytes)

This came up when addressing UACCESS warnings with CC_OPTIMIZE_FOR_SIZE
in the KCSAN runtime:
http://lkml.kernel.org/r/58708908-84a0-0a81-a836-ad97e33dbb62@infradead.org

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
---
 include/asm-generic/atomic-instrumented.h |  335 +++++++++++++++---------------
 include/asm-generic/atomic-long.h         |  331 ++++++++++++++---------------
 scripts/atomic/gen-atomic-instrumented.sh |    7 
 scripts/atomic/gen-atomic-long.sh         |    3 
 4 files changed, 340 insertions(+), 336 deletions(-)

--- a/include/asm-generic/atomic-instrumented.h
+++ b/include/asm-generic/atomic-instrumented.h
@@ -18,19 +18,20 @@
 #define _ASM_GENERIC_ATOMIC_INSTRUMENTED_H
 
 #include <linux/build_bug.h>
+#include <linux/compiler.h>
 #include <linux/kasan-checks.h>
 
-static inline void __atomic_check_read(const volatile void *v, size_t size)
+static __always_inline void __atomic_check_read(const volatile void *v, size_t size)
 {
 	kasan_check_read(v, size);
 }
 
-static inline void __atomic_check_write(const volatile void *v, size_t size)
+static __always_inline void __atomic_check_write(const volatile void *v, size_t size)
 {
 	kasan_check_write(v, size);
 }
 
-static inline int
+static __always_inline int
 atomic_read(const atomic_t *v)
 {
 	__atomic_check_read(v, sizeof(*v));
@@ -39,7 +40,7 @@ atomic_read(const atomic_t *v)
 #define atomic_read atomic_read
 
 #if defined(arch_atomic_read_acquire)
-static inline int
+static __always_inline int
 atomic_read_acquire(const atomic_t *v)
 {
 	__atomic_check_read(v, sizeof(*v));
@@ -48,7 +49,7 @@ atomic_read_acquire(const atomic_t *v)
 #define atomic_read_acquire atomic_read_acquire
 #endif
 
-static inline void
+static __always_inline void
 atomic_set(atomic_t *v, int i)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -57,7 +58,7 @@ atomic_set(atomic_t *v, int i)
 #define atomic_set atomic_set
 
 #if defined(arch_atomic_set_release)
-static inline void
+static __always_inline void
 atomic_set_release(atomic_t *v, int i)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -66,7 +67,7 @@ atomic_set_release(atomic_t *v, int i)
 #define atomic_set_release atomic_set_release
 #endif
 
-static inline void
+static __always_inline void
 atomic_add(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -75,7 +76,7 @@ atomic_add(int i, atomic_t *v)
 #define atomic_add atomic_add
 
 #if !defined(arch_atomic_add_return_relaxed) || defined(arch_atomic_add_return)
-static inline int
+static __always_inline int
 atomic_add_return(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -85,7 +86,7 @@ atomic_add_return(int i, atomic_t *v)
 #endif
 
 #if defined(arch_atomic_add_return_acquire)
-static inline int
+static __always_inline int
 atomic_add_return_acquire(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -95,7 +96,7 @@ atomic_add_return_acquire(int i, atomic_
 #endif
 
 #if defined(arch_atomic_add_return_release)
-static inline int
+static __always_inline int
 atomic_add_return_release(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -105,7 +106,7 @@ atomic_add_return_release(int i, atomic_
 #endif
 
 #if defined(arch_atomic_add_return_relaxed)
-static inline int
+static __always_inline int
 atomic_add_return_relaxed(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -115,7 +116,7 @@ atomic_add_return_relaxed(int i, atomic_
 #endif
 
 #if !defined(arch_atomic_fetch_add_relaxed) || defined(arch_atomic_fetch_add)
-static inline int
+static __always_inline int
 atomic_fetch_add(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -125,7 +126,7 @@ atomic_fetch_add(int i, atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_add_acquire)
-static inline int
+static __always_inline int
 atomic_fetch_add_acquire(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -135,7 +136,7 @@ atomic_fetch_add_acquire(int i, atomic_t
 #endif
 
 #if defined(arch_atomic_fetch_add_release)
-static inline int
+static __always_inline int
 atomic_fetch_add_release(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -145,7 +146,7 @@ atomic_fetch_add_release(int i, atomic_t
 #endif
 
 #if defined(arch_atomic_fetch_add_relaxed)
-static inline int
+static __always_inline int
 atomic_fetch_add_relaxed(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -154,7 +155,7 @@ atomic_fetch_add_relaxed(int i, atomic_t
 #define atomic_fetch_add_relaxed atomic_fetch_add_relaxed
 #endif
 
-static inline void
+static __always_inline void
 atomic_sub(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -163,7 +164,7 @@ atomic_sub(int i, atomic_t *v)
 #define atomic_sub atomic_sub
 
 #if !defined(arch_atomic_sub_return_relaxed) || defined(arch_atomic_sub_return)
-static inline int
+static __always_inline int
 atomic_sub_return(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -173,7 +174,7 @@ atomic_sub_return(int i, atomic_t *v)
 #endif
 
 #if defined(arch_atomic_sub_return_acquire)
-static inline int
+static __always_inline int
 atomic_sub_return_acquire(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -183,7 +184,7 @@ atomic_sub_return_acquire(int i, atomic_
 #endif
 
 #if defined(arch_atomic_sub_return_release)
-static inline int
+static __always_inline int
 atomic_sub_return_release(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -193,7 +194,7 @@ atomic_sub_return_release(int i, atomic_
 #endif
 
 #if defined(arch_atomic_sub_return_relaxed)
-static inline int
+static __always_inline int
 atomic_sub_return_relaxed(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -203,7 +204,7 @@ atomic_sub_return_relaxed(int i, atomic_
 #endif
 
 #if !defined(arch_atomic_fetch_sub_relaxed) || defined(arch_atomic_fetch_sub)
-static inline int
+static __always_inline int
 atomic_fetch_sub(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -213,7 +214,7 @@ atomic_fetch_sub(int i, atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_sub_acquire)
-static inline int
+static __always_inline int
 atomic_fetch_sub_acquire(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -223,7 +224,7 @@ atomic_fetch_sub_acquire(int i, atomic_t
 #endif
 
 #if defined(arch_atomic_fetch_sub_release)
-static inline int
+static __always_inline int
 atomic_fetch_sub_release(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -233,7 +234,7 @@ atomic_fetch_sub_release(int i, atomic_t
 #endif
 
 #if defined(arch_atomic_fetch_sub_relaxed)
-static inline int
+static __always_inline int
 atomic_fetch_sub_relaxed(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -243,7 +244,7 @@ atomic_fetch_sub_relaxed(int i, atomic_t
 #endif
 
 #if defined(arch_atomic_inc)
-static inline void
+static __always_inline void
 atomic_inc(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -253,7 +254,7 @@ atomic_inc(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_inc_return)
-static inline int
+static __always_inline int
 atomic_inc_return(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -263,7 +264,7 @@ atomic_inc_return(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_inc_return_acquire)
-static inline int
+static __always_inline int
 atomic_inc_return_acquire(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -273,7 +274,7 @@ atomic_inc_return_acquire(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_inc_return_release)
-static inline int
+static __always_inline int
 atomic_inc_return_release(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -283,7 +284,7 @@ atomic_inc_return_release(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_inc_return_relaxed)
-static inline int
+static __always_inline int
 atomic_inc_return_relaxed(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -293,7 +294,7 @@ atomic_inc_return_relaxed(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_inc)
-static inline int
+static __always_inline int
 atomic_fetch_inc(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -303,7 +304,7 @@ atomic_fetch_inc(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_inc_acquire)
-static inline int
+static __always_inline int
 atomic_fetch_inc_acquire(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -313,7 +314,7 @@ atomic_fetch_inc_acquire(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_inc_release)
-static inline int
+static __always_inline int
 atomic_fetch_inc_release(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -323,7 +324,7 @@ atomic_fetch_inc_release(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_inc_relaxed)
-static inline int
+static __always_inline int
 atomic_fetch_inc_relaxed(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -333,7 +334,7 @@ atomic_fetch_inc_relaxed(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_dec)
-static inline void
+static __always_inline void
 atomic_dec(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -343,7 +344,7 @@ atomic_dec(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_dec_return)
-static inline int
+static __always_inline int
 atomic_dec_return(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -353,7 +354,7 @@ atomic_dec_return(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_dec_return_acquire)
-static inline int
+static __always_inline int
 atomic_dec_return_acquire(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -363,7 +364,7 @@ atomic_dec_return_acquire(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_dec_return_release)
-static inline int
+static __always_inline int
 atomic_dec_return_release(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -373,7 +374,7 @@ atomic_dec_return_release(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_dec_return_relaxed)
-static inline int
+static __always_inline int
 atomic_dec_return_relaxed(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -383,7 +384,7 @@ atomic_dec_return_relaxed(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_dec)
-static inline int
+static __always_inline int
 atomic_fetch_dec(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -393,7 +394,7 @@ atomic_fetch_dec(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_dec_acquire)
-static inline int
+static __always_inline int
 atomic_fetch_dec_acquire(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -403,7 +404,7 @@ atomic_fetch_dec_acquire(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_dec_release)
-static inline int
+static __always_inline int
 atomic_fetch_dec_release(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -413,7 +414,7 @@ atomic_fetch_dec_release(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_dec_relaxed)
-static inline int
+static __always_inline int
 atomic_fetch_dec_relaxed(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -422,7 +423,7 @@ atomic_fetch_dec_relaxed(atomic_t *v)
 #define atomic_fetch_dec_relaxed atomic_fetch_dec_relaxed
 #endif
 
-static inline void
+static __always_inline void
 atomic_and(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -431,7 +432,7 @@ atomic_and(int i, atomic_t *v)
 #define atomic_and atomic_and
 
 #if !defined(arch_atomic_fetch_and_relaxed) || defined(arch_atomic_fetch_and)
-static inline int
+static __always_inline int
 atomic_fetch_and(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -441,7 +442,7 @@ atomic_fetch_and(int i, atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_and_acquire)
-static inline int
+static __always_inline int
 atomic_fetch_and_acquire(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -451,7 +452,7 @@ atomic_fetch_and_acquire(int i, atomic_t
 #endif
 
 #if defined(arch_atomic_fetch_and_release)
-static inline int
+static __always_inline int
 atomic_fetch_and_release(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -461,7 +462,7 @@ atomic_fetch_and_release(int i, atomic_t
 #endif
 
 #if defined(arch_atomic_fetch_and_relaxed)
-static inline int
+static __always_inline int
 atomic_fetch_and_relaxed(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -471,7 +472,7 @@ atomic_fetch_and_relaxed(int i, atomic_t
 #endif
 
 #if defined(arch_atomic_andnot)
-static inline void
+static __always_inline void
 atomic_andnot(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -481,7 +482,7 @@ atomic_andnot(int i, atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_andnot)
-static inline int
+static __always_inline int
 atomic_fetch_andnot(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -491,7 +492,7 @@ atomic_fetch_andnot(int i, atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_andnot_acquire)
-static inline int
+static __always_inline int
 atomic_fetch_andnot_acquire(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -501,7 +502,7 @@ atomic_fetch_andnot_acquire(int i, atomi
 #endif
 
 #if defined(arch_atomic_fetch_andnot_release)
-static inline int
+static __always_inline int
 atomic_fetch_andnot_release(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -511,7 +512,7 @@ atomic_fetch_andnot_release(int i, atomi
 #endif
 
 #if defined(arch_atomic_fetch_andnot_relaxed)
-static inline int
+static __always_inline int
 atomic_fetch_andnot_relaxed(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -520,7 +521,7 @@ atomic_fetch_andnot_relaxed(int i, atomi
 #define atomic_fetch_andnot_relaxed atomic_fetch_andnot_relaxed
 #endif
 
-static inline void
+static __always_inline void
 atomic_or(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -529,7 +530,7 @@ atomic_or(int i, atomic_t *v)
 #define atomic_or atomic_or
 
 #if !defined(arch_atomic_fetch_or_relaxed) || defined(arch_atomic_fetch_or)
-static inline int
+static __always_inline int
 atomic_fetch_or(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -539,7 +540,7 @@ atomic_fetch_or(int i, atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_or_acquire)
-static inline int
+static __always_inline int
 atomic_fetch_or_acquire(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -549,7 +550,7 @@ atomic_fetch_or_acquire(int i, atomic_t
 #endif
 
 #if defined(arch_atomic_fetch_or_release)
-static inline int
+static __always_inline int
 atomic_fetch_or_release(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -559,7 +560,7 @@ atomic_fetch_or_release(int i, atomic_t
 #endif
 
 #if defined(arch_atomic_fetch_or_relaxed)
-static inline int
+static __always_inline int
 atomic_fetch_or_relaxed(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -568,7 +569,7 @@ atomic_fetch_or_relaxed(int i, atomic_t
 #define atomic_fetch_or_relaxed atomic_fetch_or_relaxed
 #endif
 
-static inline void
+static __always_inline void
 atomic_xor(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -577,7 +578,7 @@ atomic_xor(int i, atomic_t *v)
 #define atomic_xor atomic_xor
 
 #if !defined(arch_atomic_fetch_xor_relaxed) || defined(arch_atomic_fetch_xor)
-static inline int
+static __always_inline int
 atomic_fetch_xor(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -587,7 +588,7 @@ atomic_fetch_xor(int i, atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_xor_acquire)
-static inline int
+static __always_inline int
 atomic_fetch_xor_acquire(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -597,7 +598,7 @@ atomic_fetch_xor_acquire(int i, atomic_t
 #endif
 
 #if defined(arch_atomic_fetch_xor_release)
-static inline int
+static __always_inline int
 atomic_fetch_xor_release(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -607,7 +608,7 @@ atomic_fetch_xor_release(int i, atomic_t
 #endif
 
 #if defined(arch_atomic_fetch_xor_relaxed)
-static inline int
+static __always_inline int
 atomic_fetch_xor_relaxed(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -617,7 +618,7 @@ atomic_fetch_xor_relaxed(int i, atomic_t
 #endif
 
 #if !defined(arch_atomic_xchg_relaxed) || defined(arch_atomic_xchg)
-static inline int
+static __always_inline int
 atomic_xchg(atomic_t *v, int i)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -627,7 +628,7 @@ atomic_xchg(atomic_t *v, int i)
 #endif
 
 #if defined(arch_atomic_xchg_acquire)
-static inline int
+static __always_inline int
 atomic_xchg_acquire(atomic_t *v, int i)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -637,7 +638,7 @@ atomic_xchg_acquire(atomic_t *v, int i)
 #endif
 
 #if defined(arch_atomic_xchg_release)
-static inline int
+static __always_inline int
 atomic_xchg_release(atomic_t *v, int i)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -647,7 +648,7 @@ atomic_xchg_release(atomic_t *v, int i)
 #endif
 
 #if defined(arch_atomic_xchg_relaxed)
-static inline int
+static __always_inline int
 atomic_xchg_relaxed(atomic_t *v, int i)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -657,7 +658,7 @@ atomic_xchg_relaxed(atomic_t *v, int i)
 #endif
 
 #if !defined(arch_atomic_cmpxchg_relaxed) || defined(arch_atomic_cmpxchg)
-static inline int
+static __always_inline int
 atomic_cmpxchg(atomic_t *v, int old, int new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -667,7 +668,7 @@ atomic_cmpxchg(atomic_t *v, int old, int
 #endif
 
 #if defined(arch_atomic_cmpxchg_acquire)
-static inline int
+static __always_inline int
 atomic_cmpxchg_acquire(atomic_t *v, int old, int new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -677,7 +678,7 @@ atomic_cmpxchg_acquire(atomic_t *v, int
 #endif
 
 #if defined(arch_atomic_cmpxchg_release)
-static inline int
+static __always_inline int
 atomic_cmpxchg_release(atomic_t *v, int old, int new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -687,7 +688,7 @@ atomic_cmpxchg_release(atomic_t *v, int
 #endif
 
 #if defined(arch_atomic_cmpxchg_relaxed)
-static inline int
+static __always_inline int
 atomic_cmpxchg_relaxed(atomic_t *v, int old, int new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -697,7 +698,7 @@ atomic_cmpxchg_relaxed(atomic_t *v, int
 #endif
 
 #if defined(arch_atomic_try_cmpxchg)
-static inline bool
+static __always_inline bool
 atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -708,7 +709,7 @@ atomic_try_cmpxchg(atomic_t *v, int *old
 #endif
 
 #if defined(arch_atomic_try_cmpxchg_acquire)
-static inline bool
+static __always_inline bool
 atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -719,7 +720,7 @@ atomic_try_cmpxchg_acquire(atomic_t *v,
 #endif
 
 #if defined(arch_atomic_try_cmpxchg_release)
-static inline bool
+static __always_inline bool
 atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -730,7 +731,7 @@ atomic_try_cmpxchg_release(atomic_t *v,
 #endif
 
 #if defined(arch_atomic_try_cmpxchg_relaxed)
-static inline bool
+static __always_inline bool
 atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -741,7 +742,7 @@ atomic_try_cmpxchg_relaxed(atomic_t *v,
 #endif
 
 #if defined(arch_atomic_sub_and_test)
-static inline bool
+static __always_inline bool
 atomic_sub_and_test(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -751,7 +752,7 @@ atomic_sub_and_test(int i, atomic_t *v)
 #endif
 
 #if defined(arch_atomic_dec_and_test)
-static inline bool
+static __always_inline bool
 atomic_dec_and_test(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -761,7 +762,7 @@ atomic_dec_and_test(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_inc_and_test)
-static inline bool
+static __always_inline bool
 atomic_inc_and_test(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -771,7 +772,7 @@ atomic_inc_and_test(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_add_negative)
-static inline bool
+static __always_inline bool
 atomic_add_negative(int i, atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -781,7 +782,7 @@ atomic_add_negative(int i, atomic_t *v)
 #endif
 
 #if defined(arch_atomic_fetch_add_unless)
-static inline int
+static __always_inline int
 atomic_fetch_add_unless(atomic_t *v, int a, int u)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -791,7 +792,7 @@ atomic_fetch_add_unless(atomic_t *v, int
 #endif
 
 #if defined(arch_atomic_add_unless)
-static inline bool
+static __always_inline bool
 atomic_add_unless(atomic_t *v, int a, int u)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -801,7 +802,7 @@ atomic_add_unless(atomic_t *v, int a, in
 #endif
 
 #if defined(arch_atomic_inc_not_zero)
-static inline bool
+static __always_inline bool
 atomic_inc_not_zero(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -811,7 +812,7 @@ atomic_inc_not_zero(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_inc_unless_negative)
-static inline bool
+static __always_inline bool
 atomic_inc_unless_negative(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -821,7 +822,7 @@ atomic_inc_unless_negative(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_dec_unless_positive)
-static inline bool
+static __always_inline bool
 atomic_dec_unless_positive(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -831,7 +832,7 @@ atomic_dec_unless_positive(atomic_t *v)
 #endif
 
 #if defined(arch_atomic_dec_if_positive)
-static inline int
+static __always_inline int
 atomic_dec_if_positive(atomic_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -840,7 +841,7 @@ atomic_dec_if_positive(atomic_t *v)
 #define atomic_dec_if_positive atomic_dec_if_positive
 #endif
 
-static inline s64
+static __always_inline s64
 atomic64_read(const atomic64_t *v)
 {
 	__atomic_check_read(v, sizeof(*v));
@@ -849,7 +850,7 @@ atomic64_read(const atomic64_t *v)
 #define atomic64_read atomic64_read
 
 #if defined(arch_atomic64_read_acquire)
-static inline s64
+static __always_inline s64
 atomic64_read_acquire(const atomic64_t *v)
 {
 	__atomic_check_read(v, sizeof(*v));
@@ -858,7 +859,7 @@ atomic64_read_acquire(const atomic64_t *
 #define atomic64_read_acquire atomic64_read_acquire
 #endif
 
-static inline void
+static __always_inline void
 atomic64_set(atomic64_t *v, s64 i)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -867,7 +868,7 @@ atomic64_set(atomic64_t *v, s64 i)
 #define atomic64_set atomic64_set
 
 #if defined(arch_atomic64_set_release)
-static inline void
+static __always_inline void
 atomic64_set_release(atomic64_t *v, s64 i)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -876,7 +877,7 @@ atomic64_set_release(atomic64_t *v, s64
 #define atomic64_set_release atomic64_set_release
 #endif
 
-static inline void
+static __always_inline void
 atomic64_add(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -885,7 +886,7 @@ atomic64_add(s64 i, atomic64_t *v)
 #define atomic64_add atomic64_add
 
 #if !defined(arch_atomic64_add_return_relaxed) || defined(arch_atomic64_add_return)
-static inline s64
+static __always_inline s64
 atomic64_add_return(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -895,7 +896,7 @@ atomic64_add_return(s64 i, atomic64_t *v
 #endif
 
 #if defined(arch_atomic64_add_return_acquire)
-static inline s64
+static __always_inline s64
 atomic64_add_return_acquire(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -905,7 +906,7 @@ atomic64_add_return_acquire(s64 i, atomi
 #endif
 
 #if defined(arch_atomic64_add_return_release)
-static inline s64
+static __always_inline s64
 atomic64_add_return_release(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -915,7 +916,7 @@ atomic64_add_return_release(s64 i, atomi
 #endif
 
 #if defined(arch_atomic64_add_return_relaxed)
-static inline s64
+static __always_inline s64
 atomic64_add_return_relaxed(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -925,7 +926,7 @@ atomic64_add_return_relaxed(s64 i, atomi
 #endif
 
 #if !defined(arch_atomic64_fetch_add_relaxed) || defined(arch_atomic64_fetch_add)
-static inline s64
+static __always_inline s64
 atomic64_fetch_add(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -935,7 +936,7 @@ atomic64_fetch_add(s64 i, atomic64_t *v)
 #endif
 
 #if defined(arch_atomic64_fetch_add_acquire)
-static inline s64
+static __always_inline s64
 atomic64_fetch_add_acquire(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -945,7 +946,7 @@ atomic64_fetch_add_acquire(s64 i, atomic
 #endif
 
 #if defined(arch_atomic64_fetch_add_release)
-static inline s64
+static __always_inline s64
 atomic64_fetch_add_release(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -955,7 +956,7 @@ atomic64_fetch_add_release(s64 i, atomic
 #endif
 
 #if defined(arch_atomic64_fetch_add_relaxed)
-static inline s64
+static __always_inline s64
 atomic64_fetch_add_relaxed(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -964,7 +965,7 @@ atomic64_fetch_add_relaxed(s64 i, atomic
 #define atomic64_fetch_add_relaxed atomic64_fetch_add_relaxed
 #endif
 
-static inline void
+static __always_inline void
 atomic64_sub(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -973,7 +974,7 @@ atomic64_sub(s64 i, atomic64_t *v)
 #define atomic64_sub atomic64_sub
 
 #if !defined(arch_atomic64_sub_return_relaxed) || defined(arch_atomic64_sub_return)
-static inline s64
+static __always_inline s64
 atomic64_sub_return(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -983,7 +984,7 @@ atomic64_sub_return(s64 i, atomic64_t *v
 #endif
 
 #if defined(arch_atomic64_sub_return_acquire)
-static inline s64
+static __always_inline s64
 atomic64_sub_return_acquire(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -993,7 +994,7 @@ atomic64_sub_return_acquire(s64 i, atomi
 #endif
 
 #if defined(arch_atomic64_sub_return_release)
-static inline s64
+static __always_inline s64
 atomic64_sub_return_release(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1003,7 +1004,7 @@ atomic64_sub_return_release(s64 i, atomi
 #endif
 
 #if defined(arch_atomic64_sub_return_relaxed)
-static inline s64
+static __always_inline s64
 atomic64_sub_return_relaxed(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1013,7 +1014,7 @@ atomic64_sub_return_relaxed(s64 i, atomi
 #endif
 
 #if !defined(arch_atomic64_fetch_sub_relaxed) || defined(arch_atomic64_fetch_sub)
-static inline s64
+static __always_inline s64
 atomic64_fetch_sub(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1023,7 +1024,7 @@ atomic64_fetch_sub(s64 i, atomic64_t *v)
 #endif
 
 #if defined(arch_atomic64_fetch_sub_acquire)
-static inline s64
+static __always_inline s64
 atomic64_fetch_sub_acquire(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1033,7 +1034,7 @@ atomic64_fetch_sub_acquire(s64 i, atomic
 #endif
 
 #if defined(arch_atomic64_fetch_sub_release)
-static inline s64
+static __always_inline s64
 atomic64_fetch_sub_release(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1043,7 +1044,7 @@ atomic64_fetch_sub_release(s64 i, atomic
 #endif
 
 #if defined(arch_atomic64_fetch_sub_relaxed)
-static inline s64
+static __always_inline s64
 atomic64_fetch_sub_relaxed(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1053,7 +1054,7 @@ atomic64_fetch_sub_relaxed(s64 i, atomic
 #endif
 
 #if defined(arch_atomic64_inc)
-static inline void
+static __always_inline void
 atomic64_inc(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1063,7 +1064,7 @@ atomic64_inc(atomic64_t *v)
 #endif
 
 #if defined(arch_atomic64_inc_return)
-static inline s64
+static __always_inline s64
 atomic64_inc_return(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1073,7 +1074,7 @@ atomic64_inc_return(atomic64_t *v)
 #endif
 
 #if defined(arch_atomic64_inc_return_acquire)
-static inline s64
+static __always_inline s64
 atomic64_inc_return_acquire(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1083,7 +1084,7 @@ atomic64_inc_return_acquire(atomic64_t *
 #endif
 
 #if defined(arch_atomic64_inc_return_release)
-static inline s64
+static __always_inline s64
 atomic64_inc_return_release(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1093,7 +1094,7 @@ atomic64_inc_return_release(atomic64_t *
 #endif
 
 #if defined(arch_atomic64_inc_return_relaxed)
-static inline s64
+static __always_inline s64
 atomic64_inc_return_relaxed(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1103,7 +1104,7 @@ atomic64_inc_return_relaxed(atomic64_t *
 #endif
 
 #if defined(arch_atomic64_fetch_inc)
-static inline s64
+static __always_inline s64
 atomic64_fetch_inc(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1113,7 +1114,7 @@ atomic64_fetch_inc(atomic64_t *v)
 #endif
 
 #if defined(arch_atomic64_fetch_inc_acquire)
-static inline s64
+static __always_inline s64
 atomic64_fetch_inc_acquire(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1123,7 +1124,7 @@ atomic64_fetch_inc_acquire(atomic64_t *v
 #endif
 
 #if defined(arch_atomic64_fetch_inc_release)
-static inline s64
+static __always_inline s64
 atomic64_fetch_inc_release(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1133,7 +1134,7 @@ atomic64_fetch_inc_release(atomic64_t *v
 #endif
 
 #if defined(arch_atomic64_fetch_inc_relaxed)
-static inline s64
+static __always_inline s64
 atomic64_fetch_inc_relaxed(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1143,7 +1144,7 @@ atomic64_fetch_inc_relaxed(atomic64_t *v
 #endif
 
 #if defined(arch_atomic64_dec)
-static inline void
+static __always_inline void
 atomic64_dec(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1153,7 +1154,7 @@ atomic64_dec(atomic64_t *v)
 #endif
 
 #if defined(arch_atomic64_dec_return)
-static inline s64
+static __always_inline s64
 atomic64_dec_return(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1163,7 +1164,7 @@ atomic64_dec_return(atomic64_t *v)
 #endif
 
 #if defined(arch_atomic64_dec_return_acquire)
-static inline s64
+static __always_inline s64
 atomic64_dec_return_acquire(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1173,7 +1174,7 @@ atomic64_dec_return_acquire(atomic64_t *
 #endif
 
 #if defined(arch_atomic64_dec_return_release)
-static inline s64
+static __always_inline s64
 atomic64_dec_return_release(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1183,7 +1184,7 @@ atomic64_dec_return_release(atomic64_t *
 #endif
 
 #if defined(arch_atomic64_dec_return_relaxed)
-static inline s64
+static __always_inline s64
 atomic64_dec_return_relaxed(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1193,7 +1194,7 @@ atomic64_dec_return_relaxed(atomic64_t *
 #endif
 
 #if defined(arch_atomic64_fetch_dec)
-static inline s64
+static __always_inline s64
 atomic64_fetch_dec(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1203,7 +1204,7 @@ atomic64_fetch_dec(atomic64_t *v)
 #endif
 
 #if defined(arch_atomic64_fetch_dec_acquire)
-static inline s64
+static __always_inline s64
 atomic64_fetch_dec_acquire(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1213,7 +1214,7 @@ atomic64_fetch_dec_acquire(atomic64_t *v
 #endif
 
 #if defined(arch_atomic64_fetch_dec_release)
-static inline s64
+static __always_inline s64
 atomic64_fetch_dec_release(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1223,7 +1224,7 @@ atomic64_fetch_dec_release(atomic64_t *v
 #endif
 
 #if defined(arch_atomic64_fetch_dec_relaxed)
-static inline s64
+static __always_inline s64
 atomic64_fetch_dec_relaxed(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1232,7 +1233,7 @@ atomic64_fetch_dec_relaxed(atomic64_t *v
 #define atomic64_fetch_dec_relaxed atomic64_fetch_dec_relaxed
 #endif
 
-static inline void
+static __always_inline void
 atomic64_and(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1241,7 +1242,7 @@ atomic64_and(s64 i, atomic64_t *v)
 #define atomic64_and atomic64_and
 
 #if !defined(arch_atomic64_fetch_and_relaxed) || defined(arch_atomic64_fetch_and)
-static inline s64
+static __always_inline s64
 atomic64_fetch_and(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1251,7 +1252,7 @@ atomic64_fetch_and(s64 i, atomic64_t *v)
 #endif
 
 #if defined(arch_atomic64_fetch_and_acquire)
-static inline s64
+static __always_inline s64
 atomic64_fetch_and_acquire(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1261,7 +1262,7 @@ atomic64_fetch_and_acquire(s64 i, atomic
 #endif
 
 #if defined(arch_atomic64_fetch_and_release)
-static inline s64
+static __always_inline s64
 atomic64_fetch_and_release(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1271,7 +1272,7 @@ atomic64_fetch_and_release(s64 i, atomic
 #endif
 
 #if defined(arch_atomic64_fetch_and_relaxed)
-static inline s64
+static __always_inline s64
 atomic64_fetch_and_relaxed(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1281,7 +1282,7 @@ atomic64_fetch_and_relaxed(s64 i, atomic
 #endif
 
 #if defined(arch_atomic64_andnot)
-static inline void
+static __always_inline void
 atomic64_andnot(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1291,7 +1292,7 @@ atomic64_andnot(s64 i, atomic64_t *v)
 #endif
 
 #if defined(arch_atomic64_fetch_andnot)
-static inline s64
+static __always_inline s64
 atomic64_fetch_andnot(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1301,7 +1302,7 @@ atomic64_fetch_andnot(s64 i, atomic64_t
 #endif
 
 #if defined(arch_atomic64_fetch_andnot_acquire)
-static inline s64
+static __always_inline s64
 atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1311,7 +1312,7 @@ atomic64_fetch_andnot_acquire(s64 i, ato
 #endif
 
 #if defined(arch_atomic64_fetch_andnot_release)
-static inline s64
+static __always_inline s64
 atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1321,7 +1322,7 @@ atomic64_fetch_andnot_release(s64 i, ato
 #endif
 
 #if defined(arch_atomic64_fetch_andnot_relaxed)
-static inline s64
+static __always_inline s64
 atomic64_fetch_andnot_relaxed(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1330,7 +1331,7 @@ atomic64_fetch_andnot_relaxed(s64 i, ato
 #define atomic64_fetch_andnot_relaxed atomic64_fetch_andnot_relaxed
 #endif
 
-static inline void
+static __always_inline void
 atomic64_or(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1339,7 +1340,7 @@ atomic64_or(s64 i, atomic64_t *v)
 #define atomic64_or atomic64_or
 
 #if !defined(arch_atomic64_fetch_or_relaxed) || defined(arch_atomic64_fetch_or)
-static inline s64
+static __always_inline s64
 atomic64_fetch_or(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1349,7 +1350,7 @@ atomic64_fetch_or(s64 i, atomic64_t *v)
 #endif
 
 #if defined(arch_atomic64_fetch_or_acquire)
-static inline s64
+static __always_inline s64
 atomic64_fetch_or_acquire(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1359,7 +1360,7 @@ atomic64_fetch_or_acquire(s64 i, atomic6
 #endif
 
 #if defined(arch_atomic64_fetch_or_release)
-static inline s64
+static __always_inline s64
 atomic64_fetch_or_release(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1369,7 +1370,7 @@ atomic64_fetch_or_release(s64 i, atomic6
 #endif
 
 #if defined(arch_atomic64_fetch_or_relaxed)
-static inline s64
+static __always_inline s64
 atomic64_fetch_or_relaxed(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1378,7 +1379,7 @@ atomic64_fetch_or_relaxed(s64 i, atomic6
 #define atomic64_fetch_or_relaxed atomic64_fetch_or_relaxed
 #endif
 
-static inline void
+static __always_inline void
 atomic64_xor(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1387,7 +1388,7 @@ atomic64_xor(s64 i, atomic64_t *v)
 #define atomic64_xor atomic64_xor
 
 #if !defined(arch_atomic64_fetch_xor_relaxed) || defined(arch_atomic64_fetch_xor)
-static inline s64
+static __always_inline s64
 atomic64_fetch_xor(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1397,7 +1398,7 @@ atomic64_fetch_xor(s64 i, atomic64_t *v)
 #endif
 
 #if defined(arch_atomic64_fetch_xor_acquire)
-static inline s64
+static __always_inline s64
 atomic64_fetch_xor_acquire(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1407,7 +1408,7 @@ atomic64_fetch_xor_acquire(s64 i, atomic
 #endif
 
 #if defined(arch_atomic64_fetch_xor_release)
-static inline s64
+static __always_inline s64
 atomic64_fetch_xor_release(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1417,7 +1418,7 @@ atomic64_fetch_xor_release(s64 i, atomic
 #endif
 
 #if defined(arch_atomic64_fetch_xor_relaxed)
-static inline s64
+static __always_inline s64
 atomic64_fetch_xor_relaxed(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1427,7 +1428,7 @@ atomic64_fetch_xor_relaxed(s64 i, atomic
 #endif
 
 #if !defined(arch_atomic64_xchg_relaxed) || defined(arch_atomic64_xchg)
-static inline s64
+static __always_inline s64
 atomic64_xchg(atomic64_t *v, s64 i)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1437,7 +1438,7 @@ atomic64_xchg(atomic64_t *v, s64 i)
 #endif
 
 #if defined(arch_atomic64_xchg_acquire)
-static inline s64
+static __always_inline s64
 atomic64_xchg_acquire(atomic64_t *v, s64 i)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1447,7 +1448,7 @@ atomic64_xchg_acquire(atomic64_t *v, s64
 #endif
 
 #if defined(arch_atomic64_xchg_release)
-static inline s64
+static __always_inline s64
 atomic64_xchg_release(atomic64_t *v, s64 i)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1457,7 +1458,7 @@ atomic64_xchg_release(atomic64_t *v, s64
 #endif
 
 #if defined(arch_atomic64_xchg_relaxed)
-static inline s64
+static __always_inline s64
 atomic64_xchg_relaxed(atomic64_t *v, s64 i)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1467,7 +1468,7 @@ atomic64_xchg_relaxed(atomic64_t *v, s64
 #endif
 
 #if !defined(arch_atomic64_cmpxchg_relaxed) || defined(arch_atomic64_cmpxchg)
-static inline s64
+static __always_inline s64
 atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1477,7 +1478,7 @@ atomic64_cmpxchg(atomic64_t *v, s64 old,
 #endif
 
 #if defined(arch_atomic64_cmpxchg_acquire)
-static inline s64
+static __always_inline s64
 atomic64_cmpxchg_acquire(atomic64_t *v, s64 old, s64 new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1487,7 +1488,7 @@ atomic64_cmpxchg_acquire(atomic64_t *v,
 #endif
 
 #if defined(arch_atomic64_cmpxchg_release)
-static inline s64
+static __always_inline s64
 atomic64_cmpxchg_release(atomic64_t *v, s64 old, s64 new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1497,7 +1498,7 @@ atomic64_cmpxchg_release(atomic64_t *v,
 #endif
 
 #if defined(arch_atomic64_cmpxchg_relaxed)
-static inline s64
+static __always_inline s64
 atomic64_cmpxchg_relaxed(atomic64_t *v, s64 old, s64 new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1507,7 +1508,7 @@ atomic64_cmpxchg_relaxed(atomic64_t *v,
 #endif
 
 #if defined(arch_atomic64_try_cmpxchg)
-static inline bool
+static __always_inline bool
 atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1518,7 +1519,7 @@ atomic64_try_cmpxchg(atomic64_t *v, s64
 #endif
 
 #if defined(arch_atomic64_try_cmpxchg_acquire)
-static inline bool
+static __always_inline bool
 atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1529,7 +1530,7 @@ atomic64_try_cmpxchg_acquire(atomic64_t
 #endif
 
 #if defined(arch_atomic64_try_cmpxchg_release)
-static inline bool
+static __always_inline bool
 atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1540,7 +1541,7 @@ atomic64_try_cmpxchg_release(atomic64_t
 #endif
 
 #if defined(arch_atomic64_try_cmpxchg_relaxed)
-static inline bool
+static __always_inline bool
 atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1551,7 +1552,7 @@ atomic64_try_cmpxchg_relaxed(atomic64_t
 #endif
 
 #if defined(arch_atomic64_sub_and_test)
-static inline bool
+static __always_inline bool
 atomic64_sub_and_test(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1561,7 +1562,7 @@ atomic64_sub_and_test(s64 i, atomic64_t
 #endif
 
 #if defined(arch_atomic64_dec_and_test)
-static inline bool
+static __always_inline bool
 atomic64_dec_and_test(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1571,7 +1572,7 @@ atomic64_dec_and_test(atomic64_t *v)
 #endif
 
 #if defined(arch_atomic64_inc_and_test)
-static inline bool
+static __always_inline bool
 atomic64_inc_and_test(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1581,7 +1582,7 @@ atomic64_inc_and_test(atomic64_t *v)
 #endif
 
 #if defined(arch_atomic64_add_negative)
-static inline bool
+static __always_inline bool
 atomic64_add_negative(s64 i, atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1591,7 +1592,7 @@ atomic64_add_negative(s64 i, atomic64_t
 #endif
 
 #if defined(arch_atomic64_fetch_add_unless)
-static inline s64
+static __always_inline s64
 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1601,7 +1602,7 @@ atomic64_fetch_add_unless(atomic64_t *v,
 #endif
 
 #if defined(arch_atomic64_add_unless)
-static inline bool
+static __always_inline bool
 atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1611,7 +1612,7 @@ atomic64_add_unless(atomic64_t *v, s64 a
 #endif
 
 #if defined(arch_atomic64_inc_not_zero)
-static inline bool
+static __always_inline bool
 atomic64_inc_not_zero(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1621,7 +1622,7 @@ atomic64_inc_not_zero(atomic64_t *v)
 #endif
 
 #if defined(arch_atomic64_inc_unless_negative)
-static inline bool
+static __always_inline bool
 atomic64_inc_unless_negative(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1631,7 +1632,7 @@ atomic64_inc_unless_negative(atomic64_t
 #endif
 
 #if defined(arch_atomic64_dec_unless_positive)
-static inline bool
+static __always_inline bool
 atomic64_dec_unless_positive(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1641,7 +1642,7 @@ atomic64_dec_unless_positive(atomic64_t
 #endif
 
 #if defined(arch_atomic64_dec_if_positive)
-static inline s64
+static __always_inline s64
 atomic64_dec_if_positive(atomic64_t *v)
 {
 	__atomic_check_write(v, sizeof(*v));
@@ -1795,4 +1796,4 @@ atomic64_dec_if_positive(atomic64_t *v)
 })
 
 #endif /* _ASM_GENERIC_ATOMIC_INSTRUMENTED_H */
-// aa929c117bdd954a0957b91fe509f118ca8b9707
+// 6c1b0b614b55b76c258f89e205622e8f004871af
--- a/include/asm-generic/atomic-long.h
+++ b/include/asm-generic/atomic-long.h
@@ -6,6 +6,7 @@
 #ifndef _ASM_GENERIC_ATOMIC_LONG_H
 #define _ASM_GENERIC_ATOMIC_LONG_H
 
+#include <linux/compiler.h>
 #include <asm/types.h>
 
 #ifdef CONFIG_64BIT
@@ -22,493 +23,493 @@ typedef atomic_t atomic_long_t;
 
 #ifdef CONFIG_64BIT
 
-static inline long
+static __always_inline long
 atomic_long_read(const atomic_long_t *v)
 {
 	return atomic64_read(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_read_acquire(const atomic_long_t *v)
 {
 	return atomic64_read_acquire(v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_set(atomic_long_t *v, long i)
 {
 	atomic64_set(v, i);
 }
 
-static inline void
+static __always_inline void
 atomic_long_set_release(atomic_long_t *v, long i)
 {
 	atomic64_set_release(v, i);
 }
 
-static inline void
+static __always_inline void
 atomic_long_add(long i, atomic_long_t *v)
 {
 	atomic64_add(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_add_return(long i, atomic_long_t *v)
 {
 	return atomic64_add_return(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_add_return_acquire(long i, atomic_long_t *v)
 {
 	return atomic64_add_return_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_add_return_release(long i, atomic_long_t *v)
 {
 	return atomic64_add_return_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_add_return_relaxed(long i, atomic_long_t *v)
 {
 	return atomic64_add_return_relaxed(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_add(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_add(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_add_acquire(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_add_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_add_release(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_add_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_add_relaxed(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_add_relaxed(i, v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_sub(long i, atomic_long_t *v)
 {
 	atomic64_sub(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_sub_return(long i, atomic_long_t *v)
 {
 	return atomic64_sub_return(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_sub_return_acquire(long i, atomic_long_t *v)
 {
 	return atomic64_sub_return_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_sub_return_release(long i, atomic_long_t *v)
 {
 	return atomic64_sub_return_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_sub_return_relaxed(long i, atomic_long_t *v)
 {
 	return atomic64_sub_return_relaxed(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_sub(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_sub(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_sub_acquire(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_sub_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_sub_release(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_sub_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_sub_relaxed(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_sub_relaxed(i, v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_inc(atomic_long_t *v)
 {
 	atomic64_inc(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_inc_return(atomic_long_t *v)
 {
 	return atomic64_inc_return(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_inc_return_acquire(atomic_long_t *v)
 {
 	return atomic64_inc_return_acquire(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_inc_return_release(atomic_long_t *v)
 {
 	return atomic64_inc_return_release(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_inc_return_relaxed(atomic_long_t *v)
 {
 	return atomic64_inc_return_relaxed(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_inc(atomic_long_t *v)
 {
 	return atomic64_fetch_inc(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_inc_acquire(atomic_long_t *v)
 {
 	return atomic64_fetch_inc_acquire(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_inc_release(atomic_long_t *v)
 {
 	return atomic64_fetch_inc_release(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_inc_relaxed(atomic_long_t *v)
 {
 	return atomic64_fetch_inc_relaxed(v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_dec(atomic_long_t *v)
 {
 	atomic64_dec(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_dec_return(atomic_long_t *v)
 {
 	return atomic64_dec_return(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_dec_return_acquire(atomic_long_t *v)
 {
 	return atomic64_dec_return_acquire(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_dec_return_release(atomic_long_t *v)
 {
 	return atomic64_dec_return_release(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_dec_return_relaxed(atomic_long_t *v)
 {
 	return atomic64_dec_return_relaxed(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_dec(atomic_long_t *v)
 {
 	return atomic64_fetch_dec(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_dec_acquire(atomic_long_t *v)
 {
 	return atomic64_fetch_dec_acquire(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_dec_release(atomic_long_t *v)
 {
 	return atomic64_fetch_dec_release(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_dec_relaxed(atomic_long_t *v)
 {
 	return atomic64_fetch_dec_relaxed(v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_and(long i, atomic_long_t *v)
 {
 	atomic64_and(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_and(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_and(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_and_acquire(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_and_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_and_release(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_and_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_and_relaxed(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_and_relaxed(i, v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_andnot(long i, atomic_long_t *v)
 {
 	atomic64_andnot(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_andnot(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_andnot(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_andnot_acquire(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_andnot_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_andnot_release(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_andnot_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_andnot_relaxed(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_andnot_relaxed(i, v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_or(long i, atomic_long_t *v)
 {
 	atomic64_or(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_or(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_or(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_or_acquire(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_or_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_or_release(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_or_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_or_relaxed(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_or_relaxed(i, v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_xor(long i, atomic_long_t *v)
 {
 	atomic64_xor(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_xor(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_xor(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_xor_acquire(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_xor_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_xor_release(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_xor_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_xor_relaxed(long i, atomic_long_t *v)
 {
 	return atomic64_fetch_xor_relaxed(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_xchg(atomic_long_t *v, long i)
 {
 	return atomic64_xchg(v, i);
 }
 
-static inline long
+static __always_inline long
 atomic_long_xchg_acquire(atomic_long_t *v, long i)
 {
 	return atomic64_xchg_acquire(v, i);
 }
 
-static inline long
+static __always_inline long
 atomic_long_xchg_release(atomic_long_t *v, long i)
 {
 	return atomic64_xchg_release(v, i);
 }
 
-static inline long
+static __always_inline long
 atomic_long_xchg_relaxed(atomic_long_t *v, long i)
 {
 	return atomic64_xchg_relaxed(v, i);
 }
 
-static inline long
+static __always_inline long
 atomic_long_cmpxchg(atomic_long_t *v, long old, long new)
 {
 	return atomic64_cmpxchg(v, old, new);
 }
 
-static inline long
+static __always_inline long
 atomic_long_cmpxchg_acquire(atomic_long_t *v, long old, long new)
 {
 	return atomic64_cmpxchg_acquire(v, old, new);
 }
 
-static inline long
+static __always_inline long
 atomic_long_cmpxchg_release(atomic_long_t *v, long old, long new)
 {
 	return atomic64_cmpxchg_release(v, old, new);
 }
 
-static inline long
+static __always_inline long
 atomic_long_cmpxchg_relaxed(atomic_long_t *v, long old, long new)
 {
 	return atomic64_cmpxchg_relaxed(v, old, new);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
 {
 	return atomic64_try_cmpxchg(v, (s64 *)old, new);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_try_cmpxchg_acquire(atomic_long_t *v, long *old, long new)
 {
 	return atomic64_try_cmpxchg_acquire(v, (s64 *)old, new);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_try_cmpxchg_release(atomic_long_t *v, long *old, long new)
 {
 	return atomic64_try_cmpxchg_release(v, (s64 *)old, new);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_try_cmpxchg_relaxed(atomic_long_t *v, long *old, long new)
 {
 	return atomic64_try_cmpxchg_relaxed(v, (s64 *)old, new);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_sub_and_test(long i, atomic_long_t *v)
 {
 	return atomic64_sub_and_test(i, v);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_dec_and_test(atomic_long_t *v)
 {
 	return atomic64_dec_and_test(v);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_inc_and_test(atomic_long_t *v)
 {
 	return atomic64_inc_and_test(v);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_add_negative(long i, atomic_long_t *v)
 {
 	return atomic64_add_negative(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_add_unless(atomic_long_t *v, long a, long u)
 {
 	return atomic64_fetch_add_unless(v, a, u);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_add_unless(atomic_long_t *v, long a, long u)
 {
 	return atomic64_add_unless(v, a, u);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_inc_not_zero(atomic_long_t *v)
 {
 	return atomic64_inc_not_zero(v);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_inc_unless_negative(atomic_long_t *v)
 {
 	return atomic64_inc_unless_negative(v);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_dec_unless_positive(atomic_long_t *v)
 {
 	return atomic64_dec_unless_positive(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_dec_if_positive(atomic_long_t *v)
 {
 	return atomic64_dec_if_positive(v);
@@ -516,493 +517,493 @@ atomic_long_dec_if_positive(atomic_long_
 
 #else /* CONFIG_64BIT */
 
-static inline long
+static __always_inline long
 atomic_long_read(const atomic_long_t *v)
 {
 	return atomic_read(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_read_acquire(const atomic_long_t *v)
 {
 	return atomic_read_acquire(v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_set(atomic_long_t *v, long i)
 {
 	atomic_set(v, i);
 }
 
-static inline void
+static __always_inline void
 atomic_long_set_release(atomic_long_t *v, long i)
 {
 	atomic_set_release(v, i);
 }
 
-static inline void
+static __always_inline void
 atomic_long_add(long i, atomic_long_t *v)
 {
 	atomic_add(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_add_return(long i, atomic_long_t *v)
 {
 	return atomic_add_return(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_add_return_acquire(long i, atomic_long_t *v)
 {
 	return atomic_add_return_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_add_return_release(long i, atomic_long_t *v)
 {
 	return atomic_add_return_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_add_return_relaxed(long i, atomic_long_t *v)
 {
 	return atomic_add_return_relaxed(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_add(long i, atomic_long_t *v)
 {
 	return atomic_fetch_add(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_add_acquire(long i, atomic_long_t *v)
 {
 	return atomic_fetch_add_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_add_release(long i, atomic_long_t *v)
 {
 	return atomic_fetch_add_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_add_relaxed(long i, atomic_long_t *v)
 {
 	return atomic_fetch_add_relaxed(i, v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_sub(long i, atomic_long_t *v)
 {
 	atomic_sub(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_sub_return(long i, atomic_long_t *v)
 {
 	return atomic_sub_return(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_sub_return_acquire(long i, atomic_long_t *v)
 {
 	return atomic_sub_return_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_sub_return_release(long i, atomic_long_t *v)
 {
 	return atomic_sub_return_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_sub_return_relaxed(long i, atomic_long_t *v)
 {
 	return atomic_sub_return_relaxed(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_sub(long i, atomic_long_t *v)
 {
 	return atomic_fetch_sub(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_sub_acquire(long i, atomic_long_t *v)
 {
 	return atomic_fetch_sub_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_sub_release(long i, atomic_long_t *v)
 {
 	return atomic_fetch_sub_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_sub_relaxed(long i, atomic_long_t *v)
 {
 	return atomic_fetch_sub_relaxed(i, v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_inc(atomic_long_t *v)
 {
 	atomic_inc(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_inc_return(atomic_long_t *v)
 {
 	return atomic_inc_return(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_inc_return_acquire(atomic_long_t *v)
 {
 	return atomic_inc_return_acquire(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_inc_return_release(atomic_long_t *v)
 {
 	return atomic_inc_return_release(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_inc_return_relaxed(atomic_long_t *v)
 {
 	return atomic_inc_return_relaxed(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_inc(atomic_long_t *v)
 {
 	return atomic_fetch_inc(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_inc_acquire(atomic_long_t *v)
 {
 	return atomic_fetch_inc_acquire(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_inc_release(atomic_long_t *v)
 {
 	return atomic_fetch_inc_release(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_inc_relaxed(atomic_long_t *v)
 {
 	return atomic_fetch_inc_relaxed(v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_dec(atomic_long_t *v)
 {
 	atomic_dec(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_dec_return(atomic_long_t *v)
 {
 	return atomic_dec_return(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_dec_return_acquire(atomic_long_t *v)
 {
 	return atomic_dec_return_acquire(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_dec_return_release(atomic_long_t *v)
 {
 	return atomic_dec_return_release(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_dec_return_relaxed(atomic_long_t *v)
 {
 	return atomic_dec_return_relaxed(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_dec(atomic_long_t *v)
 {
 	return atomic_fetch_dec(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_dec_acquire(atomic_long_t *v)
 {
 	return atomic_fetch_dec_acquire(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_dec_release(atomic_long_t *v)
 {
 	return atomic_fetch_dec_release(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_dec_relaxed(atomic_long_t *v)
 {
 	return atomic_fetch_dec_relaxed(v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_and(long i, atomic_long_t *v)
 {
 	atomic_and(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_and(long i, atomic_long_t *v)
 {
 	return atomic_fetch_and(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_and_acquire(long i, atomic_long_t *v)
 {
 	return atomic_fetch_and_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_and_release(long i, atomic_long_t *v)
 {
 	return atomic_fetch_and_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_and_relaxed(long i, atomic_long_t *v)
 {
 	return atomic_fetch_and_relaxed(i, v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_andnot(long i, atomic_long_t *v)
 {
 	atomic_andnot(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_andnot(long i, atomic_long_t *v)
 {
 	return atomic_fetch_andnot(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_andnot_acquire(long i, atomic_long_t *v)
 {
 	return atomic_fetch_andnot_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_andnot_release(long i, atomic_long_t *v)
 {
 	return atomic_fetch_andnot_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_andnot_relaxed(long i, atomic_long_t *v)
 {
 	return atomic_fetch_andnot_relaxed(i, v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_or(long i, atomic_long_t *v)
 {
 	atomic_or(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_or(long i, atomic_long_t *v)
 {
 	return atomic_fetch_or(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_or_acquire(long i, atomic_long_t *v)
 {
 	return atomic_fetch_or_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_or_release(long i, atomic_long_t *v)
 {
 	return atomic_fetch_or_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_or_relaxed(long i, atomic_long_t *v)
 {
 	return atomic_fetch_or_relaxed(i, v);
 }
 
-static inline void
+static __always_inline void
 atomic_long_xor(long i, atomic_long_t *v)
 {
 	atomic_xor(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_xor(long i, atomic_long_t *v)
 {
 	return atomic_fetch_xor(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_xor_acquire(long i, atomic_long_t *v)
 {
 	return atomic_fetch_xor_acquire(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_xor_release(long i, atomic_long_t *v)
 {
 	return atomic_fetch_xor_release(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_xor_relaxed(long i, atomic_long_t *v)
 {
 	return atomic_fetch_xor_relaxed(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_xchg(atomic_long_t *v, long i)
 {
 	return atomic_xchg(v, i);
 }
 
-static inline long
+static __always_inline long
 atomic_long_xchg_acquire(atomic_long_t *v, long i)
 {
 	return atomic_xchg_acquire(v, i);
 }
 
-static inline long
+static __always_inline long
 atomic_long_xchg_release(atomic_long_t *v, long i)
 {
 	return atomic_xchg_release(v, i);
 }
 
-static inline long
+static __always_inline long
 atomic_long_xchg_relaxed(atomic_long_t *v, long i)
 {
 	return atomic_xchg_relaxed(v, i);
 }
 
-static inline long
+static __always_inline long
 atomic_long_cmpxchg(atomic_long_t *v, long old, long new)
 {
 	return atomic_cmpxchg(v, old, new);
 }
 
-static inline long
+static __always_inline long
 atomic_long_cmpxchg_acquire(atomic_long_t *v, long old, long new)
 {
 	return atomic_cmpxchg_acquire(v, old, new);
 }
 
-static inline long
+static __always_inline long
 atomic_long_cmpxchg_release(atomic_long_t *v, long old, long new)
 {
 	return atomic_cmpxchg_release(v, old, new);
 }
 
-static inline long
+static __always_inline long
 atomic_long_cmpxchg_relaxed(atomic_long_t *v, long old, long new)
 {
 	return atomic_cmpxchg_relaxed(v, old, new);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
 {
 	return atomic_try_cmpxchg(v, (int *)old, new);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_try_cmpxchg_acquire(atomic_long_t *v, long *old, long new)
 {
 	return atomic_try_cmpxchg_acquire(v, (int *)old, new);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_try_cmpxchg_release(atomic_long_t *v, long *old, long new)
 {
 	return atomic_try_cmpxchg_release(v, (int *)old, new);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_try_cmpxchg_relaxed(atomic_long_t *v, long *old, long new)
 {
 	return atomic_try_cmpxchg_relaxed(v, (int *)old, new);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_sub_and_test(long i, atomic_long_t *v)
 {
 	return atomic_sub_and_test(i, v);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_dec_and_test(atomic_long_t *v)
 {
 	return atomic_dec_and_test(v);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_inc_and_test(atomic_long_t *v)
 {
 	return atomic_inc_and_test(v);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_add_negative(long i, atomic_long_t *v)
 {
 	return atomic_add_negative(i, v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_fetch_add_unless(atomic_long_t *v, long a, long u)
 {
 	return atomic_fetch_add_unless(v, a, u);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_add_unless(atomic_long_t *v, long a, long u)
 {
 	return atomic_add_unless(v, a, u);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_inc_not_zero(atomic_long_t *v)
 {
 	return atomic_inc_not_zero(v);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_inc_unless_negative(atomic_long_t *v)
 {
 	return atomic_inc_unless_negative(v);
 }
 
-static inline bool
+static __always_inline bool
 atomic_long_dec_unless_positive(atomic_long_t *v)
 {
 	return atomic_dec_unless_positive(v);
 }
 
-static inline long
+static __always_inline long
 atomic_long_dec_if_positive(atomic_long_t *v)
 {
 	return atomic_dec_if_positive(v);
@@ -1010,4 +1011,4 @@ atomic_long_dec_if_positive(atomic_long_
 
 #endif /* CONFIG_64BIT */
 #endif /* _ASM_GENERIC_ATOMIC_LONG_H */
-// 77558968132ce4f911ad53f6f52ce423006f6268
+// a624200981f552b2c6be4f32fe44da8289f30d87
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -84,7 +84,7 @@ gen_proto_order_variant()
 	[ ! -z "${guard}" ] && printf "#if ${guard}\n"
 
 cat <<EOF
-static inline ${ret}
+static __always_inline ${ret}
 ${atomicname}(${params})
 {
 ${checks}
@@ -147,14 +147,15 @@ cat << EOF
 #define _ASM_GENERIC_ATOMIC_INSTRUMENTED_H
 
 #include <linux/build_bug.h>
+#include <linux/compiler.h>
 #include <linux/kasan-checks.h>
 
-static inline void __atomic_check_read(const volatile void *v, size_t size)
+static __always_inline void __atomic_check_read(const volatile void *v, size_t size)
 {
 	kasan_check_read(v, size);
 }
 
-static inline void __atomic_check_write(const volatile void *v, size_t size)
+static __always_inline void __atomic_check_write(const volatile void *v, size_t size)
 {
 	kasan_check_write(v, size);
 }
--- a/scripts/atomic/gen-atomic-long.sh
+++ b/scripts/atomic/gen-atomic-long.sh
@@ -46,7 +46,7 @@ gen_proto_order_variant()
 	local retstmt="$(gen_ret_stmt "${meta}")"
 
 cat <<EOF
-static inline ${ret}
+static __always_inline ${ret}
 atomic_long_${name}(${params})
 {
 	${retstmt}${atomic}_${name}(${argscast});
@@ -64,6 +64,7 @@ cat << EOF
 #ifndef _ASM_GENERIC_ATOMIC_LONG_H
 #define _ASM_GENERIC_ATOMIC_LONG_H
 
+#include <linux/compiler.h>
 #include <asm/types.h>
 
 #ifdef CONFIG_64BIT



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 18/22] asm-generic/atomic: Use __always_inline for fallback wrappers
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (16 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 17/22] asm-generic/atomic: Use __always_inline for pure wrappers Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 16:55   ` Paul E. McKenney
  2020-02-19 14:47 ` [PATCH v3 19/22] compiler: Simple READ/WRITE_ONCE() implementations Peter Zijlstra
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat, Mark Rutland, Marco Elver

While the fallback wrappers aren't pure wrappers, they are trivial
nonetheless, and the function they wrap should determine the final
inlining policy.

For x86 tinyconfig we observe:
 - vmlinux baseline: 1315988
 - vmlinux with patch: 1315928 (-60 bytes)

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
diff --git a/include/linux/atomic-fallback.h b/include/linux/atomic-fallback.h
index a7d240e465c0..656b5489b673 100644
--- a/include/linux/atomic-fallback.h
+++ b/include/linux/atomic-fallback.h
@@ -6,6 +6,8 @@
 #ifndef _LINUX_ATOMIC_FALLBACK_H
 #define _LINUX_ATOMIC_FALLBACK_H
 
+#include <linux/compiler.h>
+
 #ifndef xchg_relaxed
 #define xchg_relaxed		xchg
 #define xchg_acquire		xchg
@@ -76,7 +78,7 @@
 #endif /* cmpxchg64_relaxed */
 
 #ifndef atomic_read_acquire
-static inline int
+static __always_inline int
 atomic_read_acquire(const atomic_t *v)
 {
 	return smp_load_acquire(&(v)->counter);
@@ -85,7 +87,7 @@ atomic_read_acquire(const atomic_t *v)
 #endif
 
 #ifndef atomic_set_release
-static inline void
+static __always_inline void
 atomic_set_release(atomic_t *v, int i)
 {
 	smp_store_release(&(v)->counter, i);
@@ -100,7 +102,7 @@ atomic_set_release(atomic_t *v, int i)
 #else /* atomic_add_return_relaxed */
 
 #ifndef atomic_add_return_acquire
-static inline int
+static __always_inline int
 atomic_add_return_acquire(int i, atomic_t *v)
 {
 	int ret = atomic_add_return_relaxed(i, v);
@@ -111,7 +113,7 @@ atomic_add_return_acquire(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_add_return_release
-static inline int
+static __always_inline int
 atomic_add_return_release(int i, atomic_t *v)
 {
 	__atomic_release_fence();
@@ -121,7 +123,7 @@ atomic_add_return_release(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_add_return
-static inline int
+static __always_inline int
 atomic_add_return(int i, atomic_t *v)
 {
 	int ret;
@@ -142,7 +144,7 @@ atomic_add_return(int i, atomic_t *v)
 #else /* atomic_fetch_add_relaxed */
 
 #ifndef atomic_fetch_add_acquire
-static inline int
+static __always_inline int
 atomic_fetch_add_acquire(int i, atomic_t *v)
 {
 	int ret = atomic_fetch_add_relaxed(i, v);
@@ -153,7 +155,7 @@ atomic_fetch_add_acquire(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_add_release
-static inline int
+static __always_inline int
 atomic_fetch_add_release(int i, atomic_t *v)
 {
 	__atomic_release_fence();
@@ -163,7 +165,7 @@ atomic_fetch_add_release(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_add
-static inline int
+static __always_inline int
 atomic_fetch_add(int i, atomic_t *v)
 {
 	int ret;
@@ -184,7 +186,7 @@ atomic_fetch_add(int i, atomic_t *v)
 #else /* atomic_sub_return_relaxed */
 
 #ifndef atomic_sub_return_acquire
-static inline int
+static __always_inline int
 atomic_sub_return_acquire(int i, atomic_t *v)
 {
 	int ret = atomic_sub_return_relaxed(i, v);
@@ -195,7 +197,7 @@ atomic_sub_return_acquire(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_sub_return_release
-static inline int
+static __always_inline int
 atomic_sub_return_release(int i, atomic_t *v)
 {
 	__atomic_release_fence();
@@ -205,7 +207,7 @@ atomic_sub_return_release(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_sub_return
-static inline int
+static __always_inline int
 atomic_sub_return(int i, atomic_t *v)
 {
 	int ret;
@@ -226,7 +228,7 @@ atomic_sub_return(int i, atomic_t *v)
 #else /* atomic_fetch_sub_relaxed */
 
 #ifndef atomic_fetch_sub_acquire
-static inline int
+static __always_inline int
 atomic_fetch_sub_acquire(int i, atomic_t *v)
 {
 	int ret = atomic_fetch_sub_relaxed(i, v);
@@ -237,7 +239,7 @@ atomic_fetch_sub_acquire(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_sub_release
-static inline int
+static __always_inline int
 atomic_fetch_sub_release(int i, atomic_t *v)
 {
 	__atomic_release_fence();
@@ -247,7 +249,7 @@ atomic_fetch_sub_release(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_sub
-static inline int
+static __always_inline int
 atomic_fetch_sub(int i, atomic_t *v)
 {
 	int ret;
@@ -262,7 +264,7 @@ atomic_fetch_sub(int i, atomic_t *v)
 #endif /* atomic_fetch_sub_relaxed */
 
 #ifndef atomic_inc
-static inline void
+static __always_inline void
 atomic_inc(atomic_t *v)
 {
 	atomic_add(1, v);
@@ -278,7 +280,7 @@ atomic_inc(atomic_t *v)
 #endif /* atomic_inc_return */
 
 #ifndef atomic_inc_return
-static inline int
+static __always_inline int
 atomic_inc_return(atomic_t *v)
 {
 	return atomic_add_return(1, v);
@@ -287,7 +289,7 @@ atomic_inc_return(atomic_t *v)
 #endif
 
 #ifndef atomic_inc_return_acquire
-static inline int
+static __always_inline int
 atomic_inc_return_acquire(atomic_t *v)
 {
 	return atomic_add_return_acquire(1, v);
@@ -296,7 +298,7 @@ atomic_inc_return_acquire(atomic_t *v)
 #endif
 
 #ifndef atomic_inc_return_release
-static inline int
+static __always_inline int
 atomic_inc_return_release(atomic_t *v)
 {
 	return atomic_add_return_release(1, v);
@@ -305,7 +307,7 @@ atomic_inc_return_release(atomic_t *v)
 #endif
 
 #ifndef atomic_inc_return_relaxed
-static inline int
+static __always_inline int
 atomic_inc_return_relaxed(atomic_t *v)
 {
 	return atomic_add_return_relaxed(1, v);
@@ -316,7 +318,7 @@ atomic_inc_return_relaxed(atomic_t *v)
 #else /* atomic_inc_return_relaxed */
 
 #ifndef atomic_inc_return_acquire
-static inline int
+static __always_inline int
 atomic_inc_return_acquire(atomic_t *v)
 {
 	int ret = atomic_inc_return_relaxed(v);
@@ -327,7 +329,7 @@ atomic_inc_return_acquire(atomic_t *v)
 #endif
 
 #ifndef atomic_inc_return_release
-static inline int
+static __always_inline int
 atomic_inc_return_release(atomic_t *v)
 {
 	__atomic_release_fence();
@@ -337,7 +339,7 @@ atomic_inc_return_release(atomic_t *v)
 #endif
 
 #ifndef atomic_inc_return
-static inline int
+static __always_inline int
 atomic_inc_return(atomic_t *v)
 {
 	int ret;
@@ -359,7 +361,7 @@ atomic_inc_return(atomic_t *v)
 #endif /* atomic_fetch_inc */
 
 #ifndef atomic_fetch_inc
-static inline int
+static __always_inline int
 atomic_fetch_inc(atomic_t *v)
 {
 	return atomic_fetch_add(1, v);
@@ -368,7 +370,7 @@ atomic_fetch_inc(atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_inc_acquire
-static inline int
+static __always_inline int
 atomic_fetch_inc_acquire(atomic_t *v)
 {
 	return atomic_fetch_add_acquire(1, v);
@@ -377,7 +379,7 @@ atomic_fetch_inc_acquire(atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_inc_release
-static inline int
+static __always_inline int
 atomic_fetch_inc_release(atomic_t *v)
 {
 	return atomic_fetch_add_release(1, v);
@@ -386,7 +388,7 @@ atomic_fetch_inc_release(atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_inc_relaxed
-static inline int
+static __always_inline int
 atomic_fetch_inc_relaxed(atomic_t *v)
 {
 	return atomic_fetch_add_relaxed(1, v);
@@ -397,7 +399,7 @@ atomic_fetch_inc_relaxed(atomic_t *v)
 #else /* atomic_fetch_inc_relaxed */
 
 #ifndef atomic_fetch_inc_acquire
-static inline int
+static __always_inline int
 atomic_fetch_inc_acquire(atomic_t *v)
 {
 	int ret = atomic_fetch_inc_relaxed(v);
@@ -408,7 +410,7 @@ atomic_fetch_inc_acquire(atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_inc_release
-static inline int
+static __always_inline int
 atomic_fetch_inc_release(atomic_t *v)
 {
 	__atomic_release_fence();
@@ -418,7 +420,7 @@ atomic_fetch_inc_release(atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_inc
-static inline int
+static __always_inline int
 atomic_fetch_inc(atomic_t *v)
 {
 	int ret;
@@ -433,7 +435,7 @@ atomic_fetch_inc(atomic_t *v)
 #endif /* atomic_fetch_inc_relaxed */
 
 #ifndef atomic_dec
-static inline void
+static __always_inline void
 atomic_dec(atomic_t *v)
 {
 	atomic_sub(1, v);
@@ -449,7 +451,7 @@ atomic_dec(atomic_t *v)
 #endif /* atomic_dec_return */
 
 #ifndef atomic_dec_return
-static inline int
+static __always_inline int
 atomic_dec_return(atomic_t *v)
 {
 	return atomic_sub_return(1, v);
@@ -458,7 +460,7 @@ atomic_dec_return(atomic_t *v)
 #endif
 
 #ifndef atomic_dec_return_acquire
-static inline int
+static __always_inline int
 atomic_dec_return_acquire(atomic_t *v)
 {
 	return atomic_sub_return_acquire(1, v);
@@ -467,7 +469,7 @@ atomic_dec_return_acquire(atomic_t *v)
 #endif
 
 #ifndef atomic_dec_return_release
-static inline int
+static __always_inline int
 atomic_dec_return_release(atomic_t *v)
 {
 	return atomic_sub_return_release(1, v);
@@ -476,7 +478,7 @@ atomic_dec_return_release(atomic_t *v)
 #endif
 
 #ifndef atomic_dec_return_relaxed
-static inline int
+static __always_inline int
 atomic_dec_return_relaxed(atomic_t *v)
 {
 	return atomic_sub_return_relaxed(1, v);
@@ -487,7 +489,7 @@ atomic_dec_return_relaxed(atomic_t *v)
 #else /* atomic_dec_return_relaxed */
 
 #ifndef atomic_dec_return_acquire
-static inline int
+static __always_inline int
 atomic_dec_return_acquire(atomic_t *v)
 {
 	int ret = atomic_dec_return_relaxed(v);
@@ -498,7 +500,7 @@ atomic_dec_return_acquire(atomic_t *v)
 #endif
 
 #ifndef atomic_dec_return_release
-static inline int
+static __always_inline int
 atomic_dec_return_release(atomic_t *v)
 {
 	__atomic_release_fence();
@@ -508,7 +510,7 @@ atomic_dec_return_release(atomic_t *v)
 #endif
 
 #ifndef atomic_dec_return
-static inline int
+static __always_inline int
 atomic_dec_return(atomic_t *v)
 {
 	int ret;
@@ -530,7 +532,7 @@ atomic_dec_return(atomic_t *v)
 #endif /* atomic_fetch_dec */
 
 #ifndef atomic_fetch_dec
-static inline int
+static __always_inline int
 atomic_fetch_dec(atomic_t *v)
 {
 	return atomic_fetch_sub(1, v);
@@ -539,7 +541,7 @@ atomic_fetch_dec(atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_dec_acquire
-static inline int
+static __always_inline int
 atomic_fetch_dec_acquire(atomic_t *v)
 {
 	return atomic_fetch_sub_acquire(1, v);
@@ -548,7 +550,7 @@ atomic_fetch_dec_acquire(atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_dec_release
-static inline int
+static __always_inline int
 atomic_fetch_dec_release(atomic_t *v)
 {
 	return atomic_fetch_sub_release(1, v);
@@ -557,7 +559,7 @@ atomic_fetch_dec_release(atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_dec_relaxed
-static inline int
+static __always_inline int
 atomic_fetch_dec_relaxed(atomic_t *v)
 {
 	return atomic_fetch_sub_relaxed(1, v);
@@ -568,7 +570,7 @@ atomic_fetch_dec_relaxed(atomic_t *v)
 #else /* atomic_fetch_dec_relaxed */
 
 #ifndef atomic_fetch_dec_acquire
-static inline int
+static __always_inline int
 atomic_fetch_dec_acquire(atomic_t *v)
 {
 	int ret = atomic_fetch_dec_relaxed(v);
@@ -579,7 +581,7 @@ atomic_fetch_dec_acquire(atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_dec_release
-static inline int
+static __always_inline int
 atomic_fetch_dec_release(atomic_t *v)
 {
 	__atomic_release_fence();
@@ -589,7 +591,7 @@ atomic_fetch_dec_release(atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_dec
-static inline int
+static __always_inline int
 atomic_fetch_dec(atomic_t *v)
 {
 	int ret;
@@ -610,7 +612,7 @@ atomic_fetch_dec(atomic_t *v)
 #else /* atomic_fetch_and_relaxed */
 
 #ifndef atomic_fetch_and_acquire
-static inline int
+static __always_inline int
 atomic_fetch_and_acquire(int i, atomic_t *v)
 {
 	int ret = atomic_fetch_and_relaxed(i, v);
@@ -621,7 +623,7 @@ atomic_fetch_and_acquire(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_and_release
-static inline int
+static __always_inline int
 atomic_fetch_and_release(int i, atomic_t *v)
 {
 	__atomic_release_fence();
@@ -631,7 +633,7 @@ atomic_fetch_and_release(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_and
-static inline int
+static __always_inline int
 atomic_fetch_and(int i, atomic_t *v)
 {
 	int ret;
@@ -646,7 +648,7 @@ atomic_fetch_and(int i, atomic_t *v)
 #endif /* atomic_fetch_and_relaxed */
 
 #ifndef atomic_andnot
-static inline void
+static __always_inline void
 atomic_andnot(int i, atomic_t *v)
 {
 	atomic_and(~i, v);
@@ -662,7 +664,7 @@ atomic_andnot(int i, atomic_t *v)
 #endif /* atomic_fetch_andnot */
 
 #ifndef atomic_fetch_andnot
-static inline int
+static __always_inline int
 atomic_fetch_andnot(int i, atomic_t *v)
 {
 	return atomic_fetch_and(~i, v);
@@ -671,7 +673,7 @@ atomic_fetch_andnot(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_andnot_acquire
-static inline int
+static __always_inline int
 atomic_fetch_andnot_acquire(int i, atomic_t *v)
 {
 	return atomic_fetch_and_acquire(~i, v);
@@ -680,7 +682,7 @@ atomic_fetch_andnot_acquire(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_andnot_release
-static inline int
+static __always_inline int
 atomic_fetch_andnot_release(int i, atomic_t *v)
 {
 	return atomic_fetch_and_release(~i, v);
@@ -689,7 +691,7 @@ atomic_fetch_andnot_release(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_andnot_relaxed
-static inline int
+static __always_inline int
 atomic_fetch_andnot_relaxed(int i, atomic_t *v)
 {
 	return atomic_fetch_and_relaxed(~i, v);
@@ -700,7 +702,7 @@ atomic_fetch_andnot_relaxed(int i, atomic_t *v)
 #else /* atomic_fetch_andnot_relaxed */
 
 #ifndef atomic_fetch_andnot_acquire
-static inline int
+static __always_inline int
 atomic_fetch_andnot_acquire(int i, atomic_t *v)
 {
 	int ret = atomic_fetch_andnot_relaxed(i, v);
@@ -711,7 +713,7 @@ atomic_fetch_andnot_acquire(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_andnot_release
-static inline int
+static __always_inline int
 atomic_fetch_andnot_release(int i, atomic_t *v)
 {
 	__atomic_release_fence();
@@ -721,7 +723,7 @@ atomic_fetch_andnot_release(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_andnot
-static inline int
+static __always_inline int
 atomic_fetch_andnot(int i, atomic_t *v)
 {
 	int ret;
@@ -742,7 +744,7 @@ atomic_fetch_andnot(int i, atomic_t *v)
 #else /* atomic_fetch_or_relaxed */
 
 #ifndef atomic_fetch_or_acquire
-static inline int
+static __always_inline int
 atomic_fetch_or_acquire(int i, atomic_t *v)
 {
 	int ret = atomic_fetch_or_relaxed(i, v);
@@ -753,7 +755,7 @@ atomic_fetch_or_acquire(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_or_release
-static inline int
+static __always_inline int
 atomic_fetch_or_release(int i, atomic_t *v)
 {
 	__atomic_release_fence();
@@ -763,7 +765,7 @@ atomic_fetch_or_release(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_or
-static inline int
+static __always_inline int
 atomic_fetch_or(int i, atomic_t *v)
 {
 	int ret;
@@ -784,7 +786,7 @@ atomic_fetch_or(int i, atomic_t *v)
 #else /* atomic_fetch_xor_relaxed */
 
 #ifndef atomic_fetch_xor_acquire
-static inline int
+static __always_inline int
 atomic_fetch_xor_acquire(int i, atomic_t *v)
 {
 	int ret = atomic_fetch_xor_relaxed(i, v);
@@ -795,7 +797,7 @@ atomic_fetch_xor_acquire(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_xor_release
-static inline int
+static __always_inline int
 atomic_fetch_xor_release(int i, atomic_t *v)
 {
 	__atomic_release_fence();
@@ -805,7 +807,7 @@ atomic_fetch_xor_release(int i, atomic_t *v)
 #endif
 
 #ifndef atomic_fetch_xor
-static inline int
+static __always_inline int
 atomic_fetch_xor(int i, atomic_t *v)
 {
 	int ret;
@@ -826,7 +828,7 @@ atomic_fetch_xor(int i, atomic_t *v)
 #else /* atomic_xchg_relaxed */
 
 #ifndef atomic_xchg_acquire
-static inline int
+static __always_inline int
 atomic_xchg_acquire(atomic_t *v, int i)
 {
 	int ret = atomic_xchg_relaxed(v, i);
@@ -837,7 +839,7 @@ atomic_xchg_acquire(atomic_t *v, int i)
 #endif
 
 #ifndef atomic_xchg_release
-static inline int
+static __always_inline int
 atomic_xchg_release(atomic_t *v, int i)
 {
 	__atomic_release_fence();
@@ -847,7 +849,7 @@ atomic_xchg_release(atomic_t *v, int i)
 #endif
 
 #ifndef atomic_xchg
-static inline int
+static __always_inline int
 atomic_xchg(atomic_t *v, int i)
 {
 	int ret;
@@ -868,7 +870,7 @@ atomic_xchg(atomic_t *v, int i)
 #else /* atomic_cmpxchg_relaxed */
 
 #ifndef atomic_cmpxchg_acquire
-static inline int
+static __always_inline int
 atomic_cmpxchg_acquire(atomic_t *v, int old, int new)
 {
 	int ret = atomic_cmpxchg_relaxed(v, old, new);
@@ -879,7 +881,7 @@ atomic_cmpxchg_acquire(atomic_t *v, int old, int new)
 #endif
 
 #ifndef atomic_cmpxchg_release
-static inline int
+static __always_inline int
 atomic_cmpxchg_release(atomic_t *v, int old, int new)
 {
 	__atomic_release_fence();
@@ -889,7 +891,7 @@ atomic_cmpxchg_release(atomic_t *v, int old, int new)
 #endif
 
 #ifndef atomic_cmpxchg
-static inline int
+static __always_inline int
 atomic_cmpxchg(atomic_t *v, int old, int new)
 {
 	int ret;
@@ -911,7 +913,7 @@ atomic_cmpxchg(atomic_t *v, int old, int new)
 #endif /* atomic_try_cmpxchg */
 
 #ifndef atomic_try_cmpxchg
-static inline bool
+static __always_inline bool
 atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 {
 	int r, o = *old;
@@ -924,7 +926,7 @@ atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 #endif
 
 #ifndef atomic_try_cmpxchg_acquire
-static inline bool
+static __always_inline bool
 atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
 {
 	int r, o = *old;
@@ -937,7 +939,7 @@ atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
 #endif
 
 #ifndef atomic_try_cmpxchg_release
-static inline bool
+static __always_inline bool
 atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
 {
 	int r, o = *old;
@@ -950,7 +952,7 @@ atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
 #endif
 
 #ifndef atomic_try_cmpxchg_relaxed
-static inline bool
+static __always_inline bool
 atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
 {
 	int r, o = *old;
@@ -965,7 +967,7 @@ atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
 #else /* atomic_try_cmpxchg_relaxed */
 
 #ifndef atomic_try_cmpxchg_acquire
-static inline bool
+static __always_inline bool
 atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
 {
 	bool ret = atomic_try_cmpxchg_relaxed(v, old, new);
@@ -976,7 +978,7 @@ atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
 #endif
 
 #ifndef atomic_try_cmpxchg_release
-static inline bool
+static __always_inline bool
 atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
 {
 	__atomic_release_fence();
@@ -986,7 +988,7 @@ atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
 #endif
 
 #ifndef atomic_try_cmpxchg
-static inline bool
+static __always_inline bool
 atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 {
 	bool ret;
@@ -1010,7 +1012,7 @@ atomic_try_cmpxchg(atomic_t *v, int *old, int new)
  * true if the result is zero, or false for all
  * other cases.
  */
-static inline bool
+static __always_inline bool
 atomic_sub_and_test(int i, atomic_t *v)
 {
 	return atomic_sub_return(i, v) == 0;
@@ -1027,7 +1029,7 @@ atomic_sub_and_test(int i, atomic_t *v)
  * returns true if the result is 0, or false for all other
  * cases.
  */
-static inline bool
+static __always_inline bool
 atomic_dec_and_test(atomic_t *v)
 {
 	return atomic_dec_return(v) == 0;
@@ -1044,7 +1046,7 @@ atomic_dec_and_test(atomic_t *v)
  * and returns true if the result is zero, or false for all
  * other cases.
  */
-static inline bool
+static __always_inline bool
 atomic_inc_and_test(atomic_t *v)
 {
 	return atomic_inc_return(v) == 0;
@@ -1062,7 +1064,7 @@ atomic_inc_and_test(atomic_t *v)
  * if the result is negative, or false when
  * result is greater than or equal to zero.
  */
-static inline bool
+static __always_inline bool
 atomic_add_negative(int i, atomic_t *v)
 {
 	return atomic_add_return(i, v) < 0;
@@ -1080,7 +1082,7 @@ atomic_add_negative(int i, atomic_t *v)
  * Atomically adds @a to @v, so long as @v was not already @u.
  * Returns original value of @v
  */
-static inline int
+static __always_inline int
 atomic_fetch_add_unless(atomic_t *v, int a, int u)
 {
 	int c = atomic_read(v);
@@ -1105,7 +1107,7 @@ atomic_fetch_add_unless(atomic_t *v, int a, int u)
  * Atomically adds @a to @v, if @v was not already @u.
  * Returns true if the addition was done.
  */
-static inline bool
+static __always_inline bool
 atomic_add_unless(atomic_t *v, int a, int u)
 {
 	return atomic_fetch_add_unless(v, a, u) != u;
@@ -1121,7 +1123,7 @@ atomic_add_unless(atomic_t *v, int a, int u)
  * Atomically increments @v by 1, if @v is non-zero.
  * Returns true if the increment was done.
  */
-static inline bool
+static __always_inline bool
 atomic_inc_not_zero(atomic_t *v)
 {
 	return atomic_add_unless(v, 1, 0);
@@ -1130,7 +1132,7 @@ atomic_inc_not_zero(atomic_t *v)
 #endif
 
 #ifndef atomic_inc_unless_negative
-static inline bool
+static __always_inline bool
 atomic_inc_unless_negative(atomic_t *v)
 {
 	int c = atomic_read(v);
@@ -1146,7 +1148,7 @@ atomic_inc_unless_negative(atomic_t *v)
 #endif
 
 #ifndef atomic_dec_unless_positive
-static inline bool
+static __always_inline bool
 atomic_dec_unless_positive(atomic_t *v)
 {
 	int c = atomic_read(v);
@@ -1162,7 +1164,7 @@ atomic_dec_unless_positive(atomic_t *v)
 #endif
 
 #ifndef atomic_dec_if_positive
-static inline int
+static __always_inline int
 atomic_dec_if_positive(atomic_t *v)
 {
 	int dec, c = atomic_read(v);
@@ -1186,7 +1188,7 @@ atomic_dec_if_positive(atomic_t *v)
 #endif
 
 #ifndef atomic64_read_acquire
-static inline s64
+static __always_inline s64
 atomic64_read_acquire(const atomic64_t *v)
 {
 	return smp_load_acquire(&(v)->counter);
@@ -1195,7 +1197,7 @@ atomic64_read_acquire(const atomic64_t *v)
 #endif
 
 #ifndef atomic64_set_release
-static inline void
+static __always_inline void
 atomic64_set_release(atomic64_t *v, s64 i)
 {
 	smp_store_release(&(v)->counter, i);
@@ -1210,7 +1212,7 @@ atomic64_set_release(atomic64_t *v, s64 i)
 #else /* atomic64_add_return_relaxed */
 
 #ifndef atomic64_add_return_acquire
-static inline s64
+static __always_inline s64
 atomic64_add_return_acquire(s64 i, atomic64_t *v)
 {
 	s64 ret = atomic64_add_return_relaxed(i, v);
@@ -1221,7 +1223,7 @@ atomic64_add_return_acquire(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_add_return_release
-static inline s64
+static __always_inline s64
 atomic64_add_return_release(s64 i, atomic64_t *v)
 {
 	__atomic_release_fence();
@@ -1231,7 +1233,7 @@ atomic64_add_return_release(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_add_return
-static inline s64
+static __always_inline s64
 atomic64_add_return(s64 i, atomic64_t *v)
 {
 	s64 ret;
@@ -1252,7 +1254,7 @@ atomic64_add_return(s64 i, atomic64_t *v)
 #else /* atomic64_fetch_add_relaxed */
 
 #ifndef atomic64_fetch_add_acquire
-static inline s64
+static __always_inline s64
 atomic64_fetch_add_acquire(s64 i, atomic64_t *v)
 {
 	s64 ret = atomic64_fetch_add_relaxed(i, v);
@@ -1263,7 +1265,7 @@ atomic64_fetch_add_acquire(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_add_release
-static inline s64
+static __always_inline s64
 atomic64_fetch_add_release(s64 i, atomic64_t *v)
 {
 	__atomic_release_fence();
@@ -1273,7 +1275,7 @@ atomic64_fetch_add_release(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_add
-static inline s64
+static __always_inline s64
 atomic64_fetch_add(s64 i, atomic64_t *v)
 {
 	s64 ret;
@@ -1294,7 +1296,7 @@ atomic64_fetch_add(s64 i, atomic64_t *v)
 #else /* atomic64_sub_return_relaxed */
 
 #ifndef atomic64_sub_return_acquire
-static inline s64
+static __always_inline s64
 atomic64_sub_return_acquire(s64 i, atomic64_t *v)
 {
 	s64 ret = atomic64_sub_return_relaxed(i, v);
@@ -1305,7 +1307,7 @@ atomic64_sub_return_acquire(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_sub_return_release
-static inline s64
+static __always_inline s64
 atomic64_sub_return_release(s64 i, atomic64_t *v)
 {
 	__atomic_release_fence();
@@ -1315,7 +1317,7 @@ atomic64_sub_return_release(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_sub_return
-static inline s64
+static __always_inline s64
 atomic64_sub_return(s64 i, atomic64_t *v)
 {
 	s64 ret;
@@ -1336,7 +1338,7 @@ atomic64_sub_return(s64 i, atomic64_t *v)
 #else /* atomic64_fetch_sub_relaxed */
 
 #ifndef atomic64_fetch_sub_acquire
-static inline s64
+static __always_inline s64
 atomic64_fetch_sub_acquire(s64 i, atomic64_t *v)
 {
 	s64 ret = atomic64_fetch_sub_relaxed(i, v);
@@ -1347,7 +1349,7 @@ atomic64_fetch_sub_acquire(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_sub_release
-static inline s64
+static __always_inline s64
 atomic64_fetch_sub_release(s64 i, atomic64_t *v)
 {
 	__atomic_release_fence();
@@ -1357,7 +1359,7 @@ atomic64_fetch_sub_release(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_sub
-static inline s64
+static __always_inline s64
 atomic64_fetch_sub(s64 i, atomic64_t *v)
 {
 	s64 ret;
@@ -1372,7 +1374,7 @@ atomic64_fetch_sub(s64 i, atomic64_t *v)
 #endif /* atomic64_fetch_sub_relaxed */
 
 #ifndef atomic64_inc
-static inline void
+static __always_inline void
 atomic64_inc(atomic64_t *v)
 {
 	atomic64_add(1, v);
@@ -1388,7 +1390,7 @@ atomic64_inc(atomic64_t *v)
 #endif /* atomic64_inc_return */
 
 #ifndef atomic64_inc_return
-static inline s64
+static __always_inline s64
 atomic64_inc_return(atomic64_t *v)
 {
 	return atomic64_add_return(1, v);
@@ -1397,7 +1399,7 @@ atomic64_inc_return(atomic64_t *v)
 #endif
 
 #ifndef atomic64_inc_return_acquire
-static inline s64
+static __always_inline s64
 atomic64_inc_return_acquire(atomic64_t *v)
 {
 	return atomic64_add_return_acquire(1, v);
@@ -1406,7 +1408,7 @@ atomic64_inc_return_acquire(atomic64_t *v)
 #endif
 
 #ifndef atomic64_inc_return_release
-static inline s64
+static __always_inline s64
 atomic64_inc_return_release(atomic64_t *v)
 {
 	return atomic64_add_return_release(1, v);
@@ -1415,7 +1417,7 @@ atomic64_inc_return_release(atomic64_t *v)
 #endif
 
 #ifndef atomic64_inc_return_relaxed
-static inline s64
+static __always_inline s64
 atomic64_inc_return_relaxed(atomic64_t *v)
 {
 	return atomic64_add_return_relaxed(1, v);
@@ -1426,7 +1428,7 @@ atomic64_inc_return_relaxed(atomic64_t *v)
 #else /* atomic64_inc_return_relaxed */
 
 #ifndef atomic64_inc_return_acquire
-static inline s64
+static __always_inline s64
 atomic64_inc_return_acquire(atomic64_t *v)
 {
 	s64 ret = atomic64_inc_return_relaxed(v);
@@ -1437,7 +1439,7 @@ atomic64_inc_return_acquire(atomic64_t *v)
 #endif
 
 #ifndef atomic64_inc_return_release
-static inline s64
+static __always_inline s64
 atomic64_inc_return_release(atomic64_t *v)
 {
 	__atomic_release_fence();
@@ -1447,7 +1449,7 @@ atomic64_inc_return_release(atomic64_t *v)
 #endif
 
 #ifndef atomic64_inc_return
-static inline s64
+static __always_inline s64
 atomic64_inc_return(atomic64_t *v)
 {
 	s64 ret;
@@ -1469,7 +1471,7 @@ atomic64_inc_return(atomic64_t *v)
 #endif /* atomic64_fetch_inc */
 
 #ifndef atomic64_fetch_inc
-static inline s64
+static __always_inline s64
 atomic64_fetch_inc(atomic64_t *v)
 {
 	return atomic64_fetch_add(1, v);
@@ -1478,7 +1480,7 @@ atomic64_fetch_inc(atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_inc_acquire
-static inline s64
+static __always_inline s64
 atomic64_fetch_inc_acquire(atomic64_t *v)
 {
 	return atomic64_fetch_add_acquire(1, v);
@@ -1487,7 +1489,7 @@ atomic64_fetch_inc_acquire(atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_inc_release
-static inline s64
+static __always_inline s64
 atomic64_fetch_inc_release(atomic64_t *v)
 {
 	return atomic64_fetch_add_release(1, v);
@@ -1496,7 +1498,7 @@ atomic64_fetch_inc_release(atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_inc_relaxed
-static inline s64
+static __always_inline s64
 atomic64_fetch_inc_relaxed(atomic64_t *v)
 {
 	return atomic64_fetch_add_relaxed(1, v);
@@ -1507,7 +1509,7 @@ atomic64_fetch_inc_relaxed(atomic64_t *v)
 #else /* atomic64_fetch_inc_relaxed */
 
 #ifndef atomic64_fetch_inc_acquire
-static inline s64
+static __always_inline s64
 atomic64_fetch_inc_acquire(atomic64_t *v)
 {
 	s64 ret = atomic64_fetch_inc_relaxed(v);
@@ -1518,7 +1520,7 @@ atomic64_fetch_inc_acquire(atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_inc_release
-static inline s64
+static __always_inline s64
 atomic64_fetch_inc_release(atomic64_t *v)
 {
 	__atomic_release_fence();
@@ -1528,7 +1530,7 @@ atomic64_fetch_inc_release(atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_inc
-static inline s64
+static __always_inline s64
 atomic64_fetch_inc(atomic64_t *v)
 {
 	s64 ret;
@@ -1543,7 +1545,7 @@ atomic64_fetch_inc(atomic64_t *v)
 #endif /* atomic64_fetch_inc_relaxed */
 
 #ifndef atomic64_dec
-static inline void
+static __always_inline void
 atomic64_dec(atomic64_t *v)
 {
 	atomic64_sub(1, v);
@@ -1559,7 +1561,7 @@ atomic64_dec(atomic64_t *v)
 #endif /* atomic64_dec_return */
 
 #ifndef atomic64_dec_return
-static inline s64
+static __always_inline s64
 atomic64_dec_return(atomic64_t *v)
 {
 	return atomic64_sub_return(1, v);
@@ -1568,7 +1570,7 @@ atomic64_dec_return(atomic64_t *v)
 #endif
 
 #ifndef atomic64_dec_return_acquire
-static inline s64
+static __always_inline s64
 atomic64_dec_return_acquire(atomic64_t *v)
 {
 	return atomic64_sub_return_acquire(1, v);
@@ -1577,7 +1579,7 @@ atomic64_dec_return_acquire(atomic64_t *v)
 #endif
 
 #ifndef atomic64_dec_return_release
-static inline s64
+static __always_inline s64
 atomic64_dec_return_release(atomic64_t *v)
 {
 	return atomic64_sub_return_release(1, v);
@@ -1586,7 +1588,7 @@ atomic64_dec_return_release(atomic64_t *v)
 #endif
 
 #ifndef atomic64_dec_return_relaxed
-static inline s64
+static __always_inline s64
 atomic64_dec_return_relaxed(atomic64_t *v)
 {
 	return atomic64_sub_return_relaxed(1, v);
@@ -1597,7 +1599,7 @@ atomic64_dec_return_relaxed(atomic64_t *v)
 #else /* atomic64_dec_return_relaxed */
 
 #ifndef atomic64_dec_return_acquire
-static inline s64
+static __always_inline s64
 atomic64_dec_return_acquire(atomic64_t *v)
 {
 	s64 ret = atomic64_dec_return_relaxed(v);
@@ -1608,7 +1610,7 @@ atomic64_dec_return_acquire(atomic64_t *v)
 #endif
 
 #ifndef atomic64_dec_return_release
-static inline s64
+static __always_inline s64
 atomic64_dec_return_release(atomic64_t *v)
 {
 	__atomic_release_fence();
@@ -1618,7 +1620,7 @@ atomic64_dec_return_release(atomic64_t *v)
 #endif
 
 #ifndef atomic64_dec_return
-static inline s64
+static __always_inline s64
 atomic64_dec_return(atomic64_t *v)
 {
 	s64 ret;
@@ -1640,7 +1642,7 @@ atomic64_dec_return(atomic64_t *v)
 #endif /* atomic64_fetch_dec */
 
 #ifndef atomic64_fetch_dec
-static inline s64
+static __always_inline s64
 atomic64_fetch_dec(atomic64_t *v)
 {
 	return atomic64_fetch_sub(1, v);
@@ -1649,7 +1651,7 @@ atomic64_fetch_dec(atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_dec_acquire
-static inline s64
+static __always_inline s64
 atomic64_fetch_dec_acquire(atomic64_t *v)
 {
 	return atomic64_fetch_sub_acquire(1, v);
@@ -1658,7 +1660,7 @@ atomic64_fetch_dec_acquire(atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_dec_release
-static inline s64
+static __always_inline s64
 atomic64_fetch_dec_release(atomic64_t *v)
 {
 	return atomic64_fetch_sub_release(1, v);
@@ -1667,7 +1669,7 @@ atomic64_fetch_dec_release(atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_dec_relaxed
-static inline s64
+static __always_inline s64
 atomic64_fetch_dec_relaxed(atomic64_t *v)
 {
 	return atomic64_fetch_sub_relaxed(1, v);
@@ -1678,7 +1680,7 @@ atomic64_fetch_dec_relaxed(atomic64_t *v)
 #else /* atomic64_fetch_dec_relaxed */
 
 #ifndef atomic64_fetch_dec_acquire
-static inline s64
+static __always_inline s64
 atomic64_fetch_dec_acquire(atomic64_t *v)
 {
 	s64 ret = atomic64_fetch_dec_relaxed(v);
@@ -1689,7 +1691,7 @@ atomic64_fetch_dec_acquire(atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_dec_release
-static inline s64
+static __always_inline s64
 atomic64_fetch_dec_release(atomic64_t *v)
 {
 	__atomic_release_fence();
@@ -1699,7 +1701,7 @@ atomic64_fetch_dec_release(atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_dec
-static inline s64
+static __always_inline s64
 atomic64_fetch_dec(atomic64_t *v)
 {
 	s64 ret;
@@ -1720,7 +1722,7 @@ atomic64_fetch_dec(atomic64_t *v)
 #else /* atomic64_fetch_and_relaxed */
 
 #ifndef atomic64_fetch_and_acquire
-static inline s64
+static __always_inline s64
 atomic64_fetch_and_acquire(s64 i, atomic64_t *v)
 {
 	s64 ret = atomic64_fetch_and_relaxed(i, v);
@@ -1731,7 +1733,7 @@ atomic64_fetch_and_acquire(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_and_release
-static inline s64
+static __always_inline s64
 atomic64_fetch_and_release(s64 i, atomic64_t *v)
 {
 	__atomic_release_fence();
@@ -1741,7 +1743,7 @@ atomic64_fetch_and_release(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_and
-static inline s64
+static __always_inline s64
 atomic64_fetch_and(s64 i, atomic64_t *v)
 {
 	s64 ret;
@@ -1756,7 +1758,7 @@ atomic64_fetch_and(s64 i, atomic64_t *v)
 #endif /* atomic64_fetch_and_relaxed */
 
 #ifndef atomic64_andnot
-static inline void
+static __always_inline void
 atomic64_andnot(s64 i, atomic64_t *v)
 {
 	atomic64_and(~i, v);
@@ -1772,7 +1774,7 @@ atomic64_andnot(s64 i, atomic64_t *v)
 #endif /* atomic64_fetch_andnot */
 
 #ifndef atomic64_fetch_andnot
-static inline s64
+static __always_inline s64
 atomic64_fetch_andnot(s64 i, atomic64_t *v)
 {
 	return atomic64_fetch_and(~i, v);
@@ -1781,7 +1783,7 @@ atomic64_fetch_andnot(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_andnot_acquire
-static inline s64
+static __always_inline s64
 atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
 {
 	return atomic64_fetch_and_acquire(~i, v);
@@ -1790,7 +1792,7 @@ atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_andnot_release
-static inline s64
+static __always_inline s64
 atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
 {
 	return atomic64_fetch_and_release(~i, v);
@@ -1799,7 +1801,7 @@ atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_andnot_relaxed
-static inline s64
+static __always_inline s64
 atomic64_fetch_andnot_relaxed(s64 i, atomic64_t *v)
 {
 	return atomic64_fetch_and_relaxed(~i, v);
@@ -1810,7 +1812,7 @@ atomic64_fetch_andnot_relaxed(s64 i, atomic64_t *v)
 #else /* atomic64_fetch_andnot_relaxed */
 
 #ifndef atomic64_fetch_andnot_acquire
-static inline s64
+static __always_inline s64
 atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
 {
 	s64 ret = atomic64_fetch_andnot_relaxed(i, v);
@@ -1821,7 +1823,7 @@ atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_andnot_release
-static inline s64
+static __always_inline s64
 atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
 {
 	__atomic_release_fence();
@@ -1831,7 +1833,7 @@ atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_andnot
-static inline s64
+static __always_inline s64
 atomic64_fetch_andnot(s64 i, atomic64_t *v)
 {
 	s64 ret;
@@ -1852,7 +1854,7 @@ atomic64_fetch_andnot(s64 i, atomic64_t *v)
 #else /* atomic64_fetch_or_relaxed */
 
 #ifndef atomic64_fetch_or_acquire
-static inline s64
+static __always_inline s64
 atomic64_fetch_or_acquire(s64 i, atomic64_t *v)
 {
 	s64 ret = atomic64_fetch_or_relaxed(i, v);
@@ -1863,7 +1865,7 @@ atomic64_fetch_or_acquire(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_or_release
-static inline s64
+static __always_inline s64
 atomic64_fetch_or_release(s64 i, atomic64_t *v)
 {
 	__atomic_release_fence();
@@ -1873,7 +1875,7 @@ atomic64_fetch_or_release(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_or
-static inline s64
+static __always_inline s64
 atomic64_fetch_or(s64 i, atomic64_t *v)
 {
 	s64 ret;
@@ -1894,7 +1896,7 @@ atomic64_fetch_or(s64 i, atomic64_t *v)
 #else /* atomic64_fetch_xor_relaxed */
 
 #ifndef atomic64_fetch_xor_acquire
-static inline s64
+static __always_inline s64
 atomic64_fetch_xor_acquire(s64 i, atomic64_t *v)
 {
 	s64 ret = atomic64_fetch_xor_relaxed(i, v);
@@ -1905,7 +1907,7 @@ atomic64_fetch_xor_acquire(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_xor_release
-static inline s64
+static __always_inline s64
 atomic64_fetch_xor_release(s64 i, atomic64_t *v)
 {
 	__atomic_release_fence();
@@ -1915,7 +1917,7 @@ atomic64_fetch_xor_release(s64 i, atomic64_t *v)
 #endif
 
 #ifndef atomic64_fetch_xor
-static inline s64
+static __always_inline s64
 atomic64_fetch_xor(s64 i, atomic64_t *v)
 {
 	s64 ret;
@@ -1936,7 +1938,7 @@ atomic64_fetch_xor(s64 i, atomic64_t *v)
 #else /* atomic64_xchg_relaxed */
 
 #ifndef atomic64_xchg_acquire
-static inline s64
+static __always_inline s64
 atomic64_xchg_acquire(atomic64_t *v, s64 i)
 {
 	s64 ret = atomic64_xchg_relaxed(v, i);
@@ -1947,7 +1949,7 @@ atomic64_xchg_acquire(atomic64_t *v, s64 i)
 #endif
 
 #ifndef atomic64_xchg_release
-static inline s64
+static __always_inline s64
 atomic64_xchg_release(atomic64_t *v, s64 i)
 {
 	__atomic_release_fence();
@@ -1957,7 +1959,7 @@ atomic64_xchg_release(atomic64_t *v, s64 i)
 #endif
 
 #ifndef atomic64_xchg
-static inline s64
+static __always_inline s64
 atomic64_xchg(atomic64_t *v, s64 i)
 {
 	s64 ret;
@@ -1978,7 +1980,7 @@ atomic64_xchg(atomic64_t *v, s64 i)
 #else /* atomic64_cmpxchg_relaxed */
 
 #ifndef atomic64_cmpxchg_acquire
-static inline s64
+static __always_inline s64
 atomic64_cmpxchg_acquire(atomic64_t *v, s64 old, s64 new)
 {
 	s64 ret = atomic64_cmpxchg_relaxed(v, old, new);
@@ -1989,7 +1991,7 @@ atomic64_cmpxchg_acquire(atomic64_t *v, s64 old, s64 new)
 #endif
 
 #ifndef atomic64_cmpxchg_release
-static inline s64
+static __always_inline s64
 atomic64_cmpxchg_release(atomic64_t *v, s64 old, s64 new)
 {
 	__atomic_release_fence();
@@ -1999,7 +2001,7 @@ atomic64_cmpxchg_release(atomic64_t *v, s64 old, s64 new)
 #endif
 
 #ifndef atomic64_cmpxchg
-static inline s64
+static __always_inline s64
 atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 {
 	s64 ret;
@@ -2021,7 +2023,7 @@ atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 #endif /* atomic64_try_cmpxchg */
 
 #ifndef atomic64_try_cmpxchg
-static inline bool
+static __always_inline bool
 atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 {
 	s64 r, o = *old;
@@ -2034,7 +2036,7 @@ atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 #endif
 
 #ifndef atomic64_try_cmpxchg_acquire
-static inline bool
+static __always_inline bool
 atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
 {
 	s64 r, o = *old;
@@ -2047,7 +2049,7 @@ atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
 #endif
 
 #ifndef atomic64_try_cmpxchg_release
-static inline bool
+static __always_inline bool
 atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
 {
 	s64 r, o = *old;
@@ -2060,7 +2062,7 @@ atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
 #endif
 
 #ifndef atomic64_try_cmpxchg_relaxed
-static inline bool
+static __always_inline bool
 atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
 {
 	s64 r, o = *old;
@@ -2075,7 +2077,7 @@ atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
 #else /* atomic64_try_cmpxchg_relaxed */
 
 #ifndef atomic64_try_cmpxchg_acquire
-static inline bool
+static __always_inline bool
 atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
 {
 	bool ret = atomic64_try_cmpxchg_relaxed(v, old, new);
@@ -2086,7 +2088,7 @@ atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
 #endif
 
 #ifndef atomic64_try_cmpxchg_release
-static inline bool
+static __always_inline bool
 atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
 {
 	__atomic_release_fence();
@@ -2096,7 +2098,7 @@ atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
 #endif
 
 #ifndef atomic64_try_cmpxchg
-static inline bool
+static __always_inline bool
 atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 {
 	bool ret;
@@ -2120,7 +2122,7 @@ atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
  * true if the result is zero, or false for all
  * other cases.
  */
-static inline bool
+static __always_inline bool
 atomic64_sub_and_test(s64 i, atomic64_t *v)
 {
 	return atomic64_sub_return(i, v) == 0;
@@ -2137,7 +2139,7 @@ atomic64_sub_and_test(s64 i, atomic64_t *v)
  * returns true if the result is 0, or false for all other
  * cases.
  */
-static inline bool
+static __always_inline bool
 atomic64_dec_and_test(atomic64_t *v)
 {
 	return atomic64_dec_return(v) == 0;
@@ -2154,7 +2156,7 @@ atomic64_dec_and_test(atomic64_t *v)
  * and returns true if the result is zero, or false for all
  * other cases.
  */
-static inline bool
+static __always_inline bool
 atomic64_inc_and_test(atomic64_t *v)
 {
 	return atomic64_inc_return(v) == 0;
@@ -2172,7 +2174,7 @@ atomic64_inc_and_test(atomic64_t *v)
  * if the result is negative, or false when
  * result is greater than or equal to zero.
  */
-static inline bool
+static __always_inline bool
 atomic64_add_negative(s64 i, atomic64_t *v)
 {
 	return atomic64_add_return(i, v) < 0;
@@ -2190,7 +2192,7 @@ atomic64_add_negative(s64 i, atomic64_t *v)
  * Atomically adds @a to @v, so long as @v was not already @u.
  * Returns original value of @v
  */
-static inline s64
+static __always_inline s64
 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
 {
 	s64 c = atomic64_read(v);
@@ -2215,7 +2217,7 @@ atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
  * Atomically adds @a to @v, if @v was not already @u.
  * Returns true if the addition was done.
  */
-static inline bool
+static __always_inline bool
 atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
 {
 	return atomic64_fetch_add_unless(v, a, u) != u;
@@ -2231,7 +2233,7 @@ atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
  * Atomically increments @v by 1, if @v is non-zero.
  * Returns true if the increment was done.
  */
-static inline bool
+static __always_inline bool
 atomic64_inc_not_zero(atomic64_t *v)
 {
 	return atomic64_add_unless(v, 1, 0);
@@ -2240,7 +2242,7 @@ atomic64_inc_not_zero(atomic64_t *v)
 #endif
 
 #ifndef atomic64_inc_unless_negative
-static inline bool
+static __always_inline bool
 atomic64_inc_unless_negative(atomic64_t *v)
 {
 	s64 c = atomic64_read(v);
@@ -2256,7 +2258,7 @@ atomic64_inc_unless_negative(atomic64_t *v)
 #endif
 
 #ifndef atomic64_dec_unless_positive
-static inline bool
+static __always_inline bool
 atomic64_dec_unless_positive(atomic64_t *v)
 {
 	s64 c = atomic64_read(v);
@@ -2272,7 +2274,7 @@ atomic64_dec_unless_positive(atomic64_t *v)
 #endif
 
 #ifndef atomic64_dec_if_positive
-static inline s64
+static __always_inline s64
 atomic64_dec_if_positive(atomic64_t *v)
 {
 	s64 dec, c = atomic64_read(v);
@@ -2292,4 +2294,4 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define atomic64_cond_read_relaxed(v, c) smp_cond_load_relaxed(&(v)->counter, (c))
 
 #endif /* _LINUX_ATOMIC_FALLBACK_H */
-// 25de4a2804d70f57e994fe3b419148658bb5378a
+// baaf45f4c24ed88ceae58baca39d7fd80bb8101b
diff --git a/scripts/atomic/fallbacks/acquire b/scripts/atomic/fallbacks/acquire
index e38871e64db6..ea489acc285e 100755
--- a/scripts/atomic/fallbacks/acquire
+++ b/scripts/atomic/fallbacks/acquire
@@ -1,5 +1,5 @@
 cat <<EOF
-static inline ${ret}
+static __always_inline ${ret}
 ${atomic}_${pfx}${name}${sfx}_acquire(${params})
 {
 	${ret} ret = ${atomic}_${pfx}${name}${sfx}_relaxed(${args});
diff --git a/scripts/atomic/fallbacks/add_negative b/scripts/atomic/fallbacks/add_negative
index e6f4815637de..03cc2e07fac5 100755
--- a/scripts/atomic/fallbacks/add_negative
+++ b/scripts/atomic/fallbacks/add_negative
@@ -8,7 +8,7 @@ cat <<EOF
  * if the result is negative, or false when
  * result is greater than or equal to zero.
  */
-static inline bool
+static __always_inline bool
 ${atomic}_add_negative(${int} i, ${atomic}_t *v)
 {
 	return ${atomic}_add_return(i, v) < 0;
diff --git a/scripts/atomic/fallbacks/add_unless b/scripts/atomic/fallbacks/add_unless
index 792533885fbf..daf87a04c850 100755
--- a/scripts/atomic/fallbacks/add_unless
+++ b/scripts/atomic/fallbacks/add_unless
@@ -8,7 +8,7 @@ cat << EOF
  * Atomically adds @a to @v, if @v was not already @u.
  * Returns true if the addition was done.
  */
-static inline bool
+static __always_inline bool
 ${atomic}_add_unless(${atomic}_t *v, ${int} a, ${int} u)
 {
 	return ${atomic}_fetch_add_unless(v, a, u) != u;
diff --git a/scripts/atomic/fallbacks/andnot b/scripts/atomic/fallbacks/andnot
index 9f3a3216b5e3..14efce01225a 100755
--- a/scripts/atomic/fallbacks/andnot
+++ b/scripts/atomic/fallbacks/andnot
@@ -1,5 +1,5 @@
 cat <<EOF
-static inline ${ret}
+static __always_inline ${ret}
 ${atomic}_${pfx}andnot${sfx}${order}(${int} i, ${atomic}_t *v)
 {
 	${retstmt}${atomic}_${pfx}and${sfx}${order}(~i, v);
diff --git a/scripts/atomic/fallbacks/dec b/scripts/atomic/fallbacks/dec
index 10bbc82be31d..118282f3a5a3 100755
--- a/scripts/atomic/fallbacks/dec
+++ b/scripts/atomic/fallbacks/dec
@@ -1,5 +1,5 @@
 cat <<EOF
-static inline ${ret}
+static __always_inline ${ret}
 ${atomic}_${pfx}dec${sfx}${order}(${atomic}_t *v)
 {
 	${retstmt}${atomic}_${pfx}sub${sfx}${order}(1, v);
diff --git a/scripts/atomic/fallbacks/dec_and_test b/scripts/atomic/fallbacks/dec_and_test
index 0ce7103b3df2..f8967a891117 100755
--- a/scripts/atomic/fallbacks/dec_and_test
+++ b/scripts/atomic/fallbacks/dec_and_test
@@ -7,7 +7,7 @@ cat <<EOF
  * returns true if the result is 0, or false for all other
  * cases.
  */
-static inline bool
+static __always_inline bool
 ${atomic}_dec_and_test(${atomic}_t *v)
 {
 	return ${atomic}_dec_return(v) == 0;
diff --git a/scripts/atomic/fallbacks/dec_if_positive b/scripts/atomic/fallbacks/dec_if_positive
index c52eacec43c8..cfb380bd2da6 100755
--- a/scripts/atomic/fallbacks/dec_if_positive
+++ b/scripts/atomic/fallbacks/dec_if_positive
@@ -1,5 +1,5 @@
 cat <<EOF
-static inline ${ret}
+static __always_inline ${ret}
 ${atomic}_dec_if_positive(${atomic}_t *v)
 {
 	${int} dec, c = ${atomic}_read(v);
diff --git a/scripts/atomic/fallbacks/dec_unless_positive b/scripts/atomic/fallbacks/dec_unless_positive
index 8a2578f14268..69cb7aa01f9c 100755
--- a/scripts/atomic/fallbacks/dec_unless_positive
+++ b/scripts/atomic/fallbacks/dec_unless_positive
@@ -1,5 +1,5 @@
 cat <<EOF
-static inline bool
+static __always_inline bool
 ${atomic}_dec_unless_positive(${atomic}_t *v)
 {
 	${int} c = ${atomic}_read(v);
diff --git a/scripts/atomic/fallbacks/fence b/scripts/atomic/fallbacks/fence
index 82f68fa6931a..92a3a4691bab 100755
--- a/scripts/atomic/fallbacks/fence
+++ b/scripts/atomic/fallbacks/fence
@@ -1,5 +1,5 @@
 cat <<EOF
-static inline ${ret}
+static __always_inline ${ret}
 ${atomic}_${pfx}${name}${sfx}(${params})
 {
 	${ret} ret;
diff --git a/scripts/atomic/fallbacks/fetch_add_unless b/scripts/atomic/fallbacks/fetch_add_unless
index d2c091db7eae..fffbc0d16fdf 100755
--- a/scripts/atomic/fallbacks/fetch_add_unless
+++ b/scripts/atomic/fallbacks/fetch_add_unless
@@ -8,7 +8,7 @@ cat << EOF
  * Atomically adds @a to @v, so long as @v was not already @u.
  * Returns original value of @v
  */
-static inline ${int}
+static __always_inline ${int}
 ${atomic}_fetch_add_unless(${atomic}_t *v, ${int} a, ${int} u)
 {
 	${int} c = ${atomic}_read(v);
diff --git a/scripts/atomic/fallbacks/inc b/scripts/atomic/fallbacks/inc
index f866b3ad2353..10751cd62829 100755
--- a/scripts/atomic/fallbacks/inc
+++ b/scripts/atomic/fallbacks/inc
@@ -1,5 +1,5 @@
 cat <<EOF
-static inline ${ret}
+static __always_inline ${ret}
 ${atomic}_${pfx}inc${sfx}${order}(${atomic}_t *v)
 {
 	${retstmt}${atomic}_${pfx}add${sfx}${order}(1, v);
diff --git a/scripts/atomic/fallbacks/inc_and_test b/scripts/atomic/fallbacks/inc_and_test
index 4e2068869f7e..4acea9c93604 100755
--- a/scripts/atomic/fallbacks/inc_and_test
+++ b/scripts/atomic/fallbacks/inc_and_test
@@ -7,7 +7,7 @@ cat <<EOF
  * and returns true if the result is zero, or false for all
  * other cases.
  */
-static inline bool
+static __always_inline bool
 ${atomic}_inc_and_test(${atomic}_t *v)
 {
 	return ${atomic}_inc_return(v) == 0;
diff --git a/scripts/atomic/fallbacks/inc_not_zero b/scripts/atomic/fallbacks/inc_not_zero
index a7c45c8d107c..d9f7b97aab42 100755
--- a/scripts/atomic/fallbacks/inc_not_zero
+++ b/scripts/atomic/fallbacks/inc_not_zero
@@ -6,7 +6,7 @@ cat <<EOF
  * Atomically increments @v by 1, if @v is non-zero.
  * Returns true if the increment was done.
  */
-static inline bool
+static __always_inline bool
 ${atomic}_inc_not_zero(${atomic}_t *v)
 {
 	return ${atomic}_add_unless(v, 1, 0);
diff --git a/scripts/atomic/fallbacks/inc_unless_negative b/scripts/atomic/fallbacks/inc_unless_negative
index 0c266e71dbd4..177a7cb51eda 100755
--- a/scripts/atomic/fallbacks/inc_unless_negative
+++ b/scripts/atomic/fallbacks/inc_unless_negative
@@ -1,5 +1,5 @@
 cat <<EOF
-static inline bool
+static __always_inline bool
 ${atomic}_inc_unless_negative(${atomic}_t *v)
 {
 	${int} c = ${atomic}_read(v);
diff --git a/scripts/atomic/fallbacks/read_acquire b/scripts/atomic/fallbacks/read_acquire
index 75863b5203f7..12fa83cb3a6d 100755
--- a/scripts/atomic/fallbacks/read_acquire
+++ b/scripts/atomic/fallbacks/read_acquire
@@ -1,5 +1,5 @@
 cat <<EOF
-static inline ${ret}
+static __always_inline ${ret}
 ${atomic}_read_acquire(const ${atomic}_t *v)
 {
 	return smp_load_acquire(&(v)->counter);
diff --git a/scripts/atomic/fallbacks/release b/scripts/atomic/fallbacks/release
index 3f628a3802d9..730d2a6d3e07 100755
--- a/scripts/atomic/fallbacks/release
+++ b/scripts/atomic/fallbacks/release
@@ -1,5 +1,5 @@
 cat <<EOF
-static inline ${ret}
+static __always_inline ${ret}
 ${atomic}_${pfx}${name}${sfx}_release(${params})
 {
 	__atomic_release_fence();
diff --git a/scripts/atomic/fallbacks/set_release b/scripts/atomic/fallbacks/set_release
index 45bb5e0cfc08..e5d72c717434 100755
--- a/scripts/atomic/fallbacks/set_release
+++ b/scripts/atomic/fallbacks/set_release
@@ -1,5 +1,5 @@
 cat <<EOF
-static inline void
+static __always_inline void
 ${atomic}_set_release(${atomic}_t *v, ${int} i)
 {
 	smp_store_release(&(v)->counter, i);
diff --git a/scripts/atomic/fallbacks/sub_and_test b/scripts/atomic/fallbacks/sub_and_test
index 289ef17a2d7a..6cfe4ed49746 100755
--- a/scripts/atomic/fallbacks/sub_and_test
+++ b/scripts/atomic/fallbacks/sub_and_test
@@ -8,7 +8,7 @@ cat <<EOF
  * true if the result is zero, or false for all
  * other cases.
  */
-static inline bool
+static __always_inline bool
 ${atomic}_sub_and_test(${int} i, ${atomic}_t *v)
 {
 	return ${atomic}_sub_return(i, v) == 0;
diff --git a/scripts/atomic/fallbacks/try_cmpxchg b/scripts/atomic/fallbacks/try_cmpxchg
index 4ed85e2f5378..c7a26213b978 100755
--- a/scripts/atomic/fallbacks/try_cmpxchg
+++ b/scripts/atomic/fallbacks/try_cmpxchg
@@ -1,5 +1,5 @@
 cat <<EOF
-static inline bool
+static __always_inline bool
 ${atomic}_try_cmpxchg${order}(${atomic}_t *v, ${int} *old, ${int} new)
 {
 	${int} r, o = *old;
diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
index 1bd7c1707633..b6c6f5d306a7 100755
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -149,6 +149,8 @@ cat << EOF
 #ifndef _LINUX_ATOMIC_FALLBACK_H
 #define _LINUX_ATOMIC_FALLBACK_H
 
+#include <linux/compiler.h>
+
 EOF
 
 for xchg in "xchg" "cmpxchg" "cmpxchg64"; do



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 19/22] compiler: Simple READ/WRITE_ONCE() implementations
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (17 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 18/22] asm-generic/atomic: Use __always_inline for fallback wrappers Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 14:47 ` [PATCH v3 20/22] locking/atomics: Flip fallbacks and instrumentation Peter Zijlstra
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat

Because I need WRITE_ONCE_NOCHECK() and in anticipation of Will's
READ_ONCE rewrite, provide __{READ,WRITE}_ONCE_SCALAR().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/compiler.h |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -289,6 +289,14 @@ unsigned long read_word_at_a_time(const
 	__u.__val;					\
 })
 
+#define __READ_ONCE_SCALAR(x)			\
+	(*(const volatile typeof(x) *)&(x))
+
+#define __WRITE_ONCE_SCALAR(x, val)		\
+do {						\
+	*(volatile typeof(x) *)&(x) = val;	\
+} while (0)
+
 #endif /* __KERNEL__ */
 
 /*



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 20/22] locking/atomics: Flip fallbacks and instrumentation
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (18 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 19/22] compiler: Simple READ/WRITE_ONCE() implementations Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 14:47 ` [PATCH v3 21/22] x86/int3: Avoid atomic instrumentation Peter Zijlstra
  2020-02-19 14:47 ` [PATCH v3 22/22] x86/int3: Ensure that poke_int3_handler() is not sanitized Peter Zijlstra
  21 siblings, 0 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat, Mark Rutland

Currently instrumentation of atomic primitives is done at the
architecture level, while composites or fallbacks are provided at the
generic level.

The result is that there are no uninstrumented variants of the
fallbacks. Since there is now need of such (see the next patch),
invert this ordering.

Doing this means moving the instrumentation into the generic code as
well as having (for now) two variants of the fallbacks.

Notes:

 - the various *cond_read* primitives are not proper fallbacks
   and got moved into linux/atomic.c. No arch_ variants are
   generated because the base primitives smp_cond_load*()
   are instrumented.

 - once all architectures are moved over to arch_atomic_ we can remove
   one of the fallback variants and reclaim some 2300 lines.

 - atomic_{read,set}*() are no longer double-instrumented

Cc: Mark Rutland <mark.rutland@arm.com>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/include/asm/atomic.h              |    6 
 arch/x86/include/asm/atomic.h                |   17 
 arch/x86/include/asm/atomic64_32.h           |    9 
 arch/x86/include/asm/atomic64_64.h           |   15 
 include/linux/atomic-arch-fallback.h         | 2291 +++++++++++++++++++++++++++
 include/linux/atomic-fallback.h              |    8 
 include/linux/atomic.h                       |   11 
 scripts/atomic/fallbacks/acquire             |    4 
 scripts/atomic/fallbacks/add_negative        |    6 
 scripts/atomic/fallbacks/add_unless          |    6 
 scripts/atomic/fallbacks/andnot              |    4 
 scripts/atomic/fallbacks/dec                 |    4 
 scripts/atomic/fallbacks/dec_and_test        |    6 
 scripts/atomic/fallbacks/dec_if_positive     |    6 
 scripts/atomic/fallbacks/dec_unless_positive |    6 
 scripts/atomic/fallbacks/fence               |    4 
 scripts/atomic/fallbacks/fetch_add_unless    |    8 
 scripts/atomic/fallbacks/inc                 |    4 
 scripts/atomic/fallbacks/inc_and_test        |    6 
 scripts/atomic/fallbacks/inc_not_zero        |    6 
 scripts/atomic/fallbacks/inc_unless_negative |    6 
 scripts/atomic/fallbacks/read_acquire        |    2 
 scripts/atomic/fallbacks/release             |    4 
 scripts/atomic/fallbacks/set_release         |    2 
 scripts/atomic/fallbacks/sub_and_test        |    6 
 scripts/atomic/fallbacks/try_cmpxchg         |    4 
 scripts/atomic/gen-atomic-fallback.sh        |   29 
 scripts/atomic/gen-atomics.sh                |    5 
 28 files changed, 2403 insertions(+), 82 deletions(-)

--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -101,8 +101,8 @@ static inline long arch_atomic64_dec_if_
 
 #define ATOMIC_INIT(i)	{ (i) }
 
-#define arch_atomic_read(v)			READ_ONCE((v)->counter)
-#define arch_atomic_set(v, i)			WRITE_ONCE(((v)->counter), (i))
+#define arch_atomic_read(v)			__READ_ONCE_SCALAR((v)->counter)
+#define arch_atomic_set(v, i)			__WRITE_ONCE_SCALAR(((v)->counter), (i))
 
 #define arch_atomic_add_return_relaxed		arch_atomic_add_return_relaxed
 #define arch_atomic_add_return_acquire		arch_atomic_add_return_acquire
@@ -225,6 +225,6 @@ static inline long arch_atomic64_dec_if_
 
 #define arch_atomic64_dec_if_positive		arch_atomic64_dec_if_positive
 
-#include <asm-generic/atomic-instrumented.h>
+#define ARCH_ATOMIC
 
 #endif /* __ASM_ATOMIC_H */
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -28,7 +28,7 @@ static __always_inline int arch_atomic_r
 	 * Note for KASAN: we deliberately don't use READ_ONCE_NOCHECK() here,
 	 * it's non-inlined function that increases binary size and stack usage.
 	 */
-	return READ_ONCE((v)->counter);
+	return __READ_ONCE_SCALAR((v)->counter);
 }
 
 /**
@@ -40,7 +40,7 @@ static __always_inline int arch_atomic_r
  */
 static __always_inline void arch_atomic_set(atomic_t *v, int i)
 {
-	WRITE_ONCE(v->counter, i);
+	__WRITE_ONCE_SCALAR(v->counter, i);
 }
 
 /**
@@ -166,6 +166,7 @@ static __always_inline int arch_atomic_a
 {
 	return i + xadd(&v->counter, i);
 }
+#define arch_atomic_add_return arch_atomic_add_return
 
 /**
  * arch_atomic_sub_return - subtract integer and return
@@ -178,32 +179,37 @@ static __always_inline int arch_atomic_s
 {
 	return arch_atomic_add_return(-i, v);
 }
+#define arch_atomic_sub_return arch_atomic_sub_return
 
 static __always_inline int arch_atomic_fetch_add(int i, atomic_t *v)
 {
 	return xadd(&v->counter, i);
 }
+#define arch_atomic_fetch_add arch_atomic_fetch_add
 
 static __always_inline int arch_atomic_fetch_sub(int i, atomic_t *v)
 {
 	return xadd(&v->counter, -i);
 }
+#define arch_atomic_fetch_sub arch_atomic_fetch_sub
 
 static __always_inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
 {
 	return arch_cmpxchg(&v->counter, old, new);
 }
+#define arch_atomic_cmpxchg arch_atomic_cmpxchg
 
-#define arch_atomic_try_cmpxchg arch_atomic_try_cmpxchg
 static __always_inline bool arch_atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 {
 	return try_cmpxchg(&v->counter, old, new);
 }
+#define arch_atomic_try_cmpxchg arch_atomic_try_cmpxchg
 
 static inline int arch_atomic_xchg(atomic_t *v, int new)
 {
 	return arch_xchg(&v->counter, new);
 }
+#define arch_atomic_xchg arch_atomic_xchg
 
 static inline void arch_atomic_and(int i, atomic_t *v)
 {
@@ -221,6 +227,7 @@ static inline int arch_atomic_fetch_and(
 
 	return val;
 }
+#define arch_atomic_fetch_and arch_atomic_fetch_and
 
 static inline void arch_atomic_or(int i, atomic_t *v)
 {
@@ -238,6 +245,7 @@ static inline int arch_atomic_fetch_or(i
 
 	return val;
 }
+#define arch_atomic_fetch_or arch_atomic_fetch_or
 
 static inline void arch_atomic_xor(int i, atomic_t *v)
 {
@@ -255,6 +263,7 @@ static inline int arch_atomic_fetch_xor(
 
 	return val;
 }
+#define arch_atomic_fetch_xor arch_atomic_fetch_xor
 
 #ifdef CONFIG_X86_32
 # include <asm/atomic64_32.h>
@@ -262,6 +271,6 @@ static inline int arch_atomic_fetch_xor(
 # include <asm/atomic64_64.h>
 #endif
 
-#include <asm-generic/atomic-instrumented.h>
+#define ARCH_ATOMIC
 
 #endif /* _ASM_X86_ATOMIC_H */
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -75,6 +75,7 @@ static inline s64 arch_atomic64_cmpxchg(
 {
 	return arch_cmpxchg64(&v->counter, o, n);
 }
+#define arch_atomic64_cmpxchg arch_atomic64_cmpxchg
 
 /**
  * arch_atomic64_xchg - xchg atomic64 variable
@@ -94,6 +95,7 @@ static inline s64 arch_atomic64_xchg(ato
 			     : "memory");
 	return o;
 }
+#define arch_atomic64_xchg arch_atomic64_xchg
 
 /**
  * arch_atomic64_set - set atomic64 variable
@@ -138,6 +140,7 @@ static inline s64 arch_atomic64_add_retu
 			     ASM_NO_INPUT_CLOBBER("memory"));
 	return i;
 }
+#define arch_atomic64_add_return arch_atomic64_add_return
 
 /*
  * Other variants with different arithmetic operators:
@@ -149,6 +152,7 @@ static inline s64 arch_atomic64_sub_retu
 			     ASM_NO_INPUT_CLOBBER("memory"));
 	return i;
 }
+#define arch_atomic64_sub_return arch_atomic64_sub_return
 
 static inline s64 arch_atomic64_inc_return(atomic64_t *v)
 {
@@ -242,6 +246,7 @@ static inline int arch_atomic64_add_unle
 			     "S" (v) : "memory");
 	return (int)a;
 }
+#define arch_atomic64_add_unless arch_atomic64_add_unless
 
 static inline int arch_atomic64_inc_not_zero(atomic64_t *v)
 {
@@ -281,6 +286,7 @@ static inline s64 arch_atomic64_fetch_an
 
 	return old;
 }
+#define arch_atomic64_fetch_and arch_atomic64_fetch_and
 
 static inline void arch_atomic64_or(s64 i, atomic64_t *v)
 {
@@ -299,6 +305,7 @@ static inline s64 arch_atomic64_fetch_or
 
 	return old;
 }
+#define arch_atomic64_fetch_or arch_atomic64_fetch_or
 
 static inline void arch_atomic64_xor(s64 i, atomic64_t *v)
 {
@@ -317,6 +324,7 @@ static inline s64 arch_atomic64_fetch_xo
 
 	return old;
 }
+#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
 
 static inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
 {
@@ -327,6 +335,7 @@ static inline s64 arch_atomic64_fetch_ad
 
 	return old;
 }
+#define arch_atomic64_fetch_add arch_atomic64_fetch_add
 
 #define arch_atomic64_fetch_sub(i, v)	arch_atomic64_fetch_add(-(i), (v))
 
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -19,7 +19,7 @@
  */
 static inline s64 arch_atomic64_read(const atomic64_t *v)
 {
-	return READ_ONCE((v)->counter);
+	return __READ_ONCE_SCALAR((v)->counter);
 }
 
 /**
@@ -31,7 +31,7 @@ static inline s64 arch_atomic64_read(con
  */
 static inline void arch_atomic64_set(atomic64_t *v, s64 i)
 {
-	WRITE_ONCE(v->counter, i);
+	__WRITE_ONCE_SCALAR(v->counter, i);
 }
 
 /**
@@ -159,37 +159,43 @@ static __always_inline s64 arch_atomic64
 {
 	return i + xadd(&v->counter, i);
 }
+#define arch_atomic64_add_return arch_atomic64_add_return
 
 static inline s64 arch_atomic64_sub_return(s64 i, atomic64_t *v)
 {
 	return arch_atomic64_add_return(-i, v);
 }
+#define arch_atomic64_sub_return arch_atomic64_sub_return
 
 static inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v)
 {
 	return xadd(&v->counter, i);
 }
+#define arch_atomic64_fetch_add arch_atomic64_fetch_add
 
 static inline s64 arch_atomic64_fetch_sub(s64 i, atomic64_t *v)
 {
 	return xadd(&v->counter, -i);
 }
+#define arch_atomic64_fetch_sub arch_atomic64_fetch_sub
 
 static inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 {
 	return arch_cmpxchg(&v->counter, old, new);
 }
+#define arch_atomic64_cmpxchg arch_atomic64_cmpxchg
 
-#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg
 static __always_inline bool arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 {
 	return try_cmpxchg(&v->counter, old, new);
 }
+#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg
 
 static inline s64 arch_atomic64_xchg(atomic64_t *v, s64 new)
 {
 	return arch_xchg(&v->counter, new);
 }
+#define arch_atomic64_xchg arch_atomic64_xchg
 
 static inline void arch_atomic64_and(s64 i, atomic64_t *v)
 {
@@ -207,6 +213,7 @@ static inline s64 arch_atomic64_fetch_an
 	} while (!arch_atomic64_try_cmpxchg(v, &val, val & i));
 	return val;
 }
+#define arch_atomic64_fetch_and arch_atomic64_fetch_and
 
 static inline void arch_atomic64_or(s64 i, atomic64_t *v)
 {
@@ -224,6 +231,7 @@ static inline s64 arch_atomic64_fetch_or
 	} while (!arch_atomic64_try_cmpxchg(v, &val, val | i));
 	return val;
 }
+#define arch_atomic64_fetch_or arch_atomic64_fetch_or
 
 static inline void arch_atomic64_xor(s64 i, atomic64_t *v)
 {
@@ -241,5 +249,6 @@ static inline s64 arch_atomic64_fetch_xo
 	} while (!arch_atomic64_try_cmpxchg(v, &val, val ^ i));
 	return val;
 }
+#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
 
 #endif /* _ASM_X86_ATOMIC64_64_H */
--- /dev/null
+++ b/include/linux/atomic-arch-fallback.h
@@ -0,0 +1,2291 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Generated by scripts/atomic/gen-atomic-fallback.sh
+// DO NOT MODIFY THIS FILE DIRECTLY
+
+#ifndef _LINUX_ATOMIC_FALLBACK_H
+#define _LINUX_ATOMIC_FALLBACK_H
+
+#include <linux/compiler.h>
+
+#ifndef arch_xchg_relaxed
+#define arch_xchg_relaxed		arch_xchg
+#define arch_xchg_acquire		arch_xchg
+#define arch_xchg_release		arch_xchg
+#else /* arch_xchg_relaxed */
+
+#ifndef arch_xchg_acquire
+#define arch_xchg_acquire(...) \
+	__atomic_op_acquire(arch_xchg, __VA_ARGS__)
+#endif
+
+#ifndef arch_xchg_release
+#define arch_xchg_release(...) \
+	__atomic_op_release(arch_xchg, __VA_ARGS__)
+#endif
+
+#ifndef arch_xchg
+#define arch_xchg(...) \
+	__atomic_op_fence(arch_xchg, __VA_ARGS__)
+#endif
+
+#endif /* arch_xchg_relaxed */
+
+#ifndef arch_cmpxchg_relaxed
+#define arch_cmpxchg_relaxed		arch_cmpxchg
+#define arch_cmpxchg_acquire		arch_cmpxchg
+#define arch_cmpxchg_release		arch_cmpxchg
+#else /* arch_cmpxchg_relaxed */
+
+#ifndef arch_cmpxchg_acquire
+#define arch_cmpxchg_acquire(...) \
+	__atomic_op_acquire(arch_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef arch_cmpxchg_release
+#define arch_cmpxchg_release(...) \
+	__atomic_op_release(arch_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef arch_cmpxchg
+#define arch_cmpxchg(...) \
+	__atomic_op_fence(arch_cmpxchg, __VA_ARGS__)
+#endif
+
+#endif /* arch_cmpxchg_relaxed */
+
+#ifndef arch_cmpxchg64_relaxed
+#define arch_cmpxchg64_relaxed		arch_cmpxchg64
+#define arch_cmpxchg64_acquire		arch_cmpxchg64
+#define arch_cmpxchg64_release		arch_cmpxchg64
+#else /* arch_cmpxchg64_relaxed */
+
+#ifndef arch_cmpxchg64_acquire
+#define arch_cmpxchg64_acquire(...) \
+	__atomic_op_acquire(arch_cmpxchg64, __VA_ARGS__)
+#endif
+
+#ifndef arch_cmpxchg64_release
+#define arch_cmpxchg64_release(...) \
+	__atomic_op_release(arch_cmpxchg64, __VA_ARGS__)
+#endif
+
+#ifndef arch_cmpxchg64
+#define arch_cmpxchg64(...) \
+	__atomic_op_fence(arch_cmpxchg64, __VA_ARGS__)
+#endif
+
+#endif /* arch_cmpxchg64_relaxed */
+
+#ifndef arch_atomic_read_acquire
+static __always_inline int
+arch_atomic_read_acquire(const atomic_t *v)
+{
+	return smp_load_acquire(&(v)->counter);
+}
+#define arch_atomic_read_acquire arch_atomic_read_acquire
+#endif
+
+#ifndef arch_atomic_set_release
+static __always_inline void
+arch_atomic_set_release(atomic_t *v, int i)
+{
+	smp_store_release(&(v)->counter, i);
+}
+#define arch_atomic_set_release arch_atomic_set_release
+#endif
+
+#ifndef arch_atomic_add_return_relaxed
+#define arch_atomic_add_return_acquire arch_atomic_add_return
+#define arch_atomic_add_return_release arch_atomic_add_return
+#define arch_atomic_add_return_relaxed arch_atomic_add_return
+#else /* arch_atomic_add_return_relaxed */
+
+#ifndef arch_atomic_add_return_acquire
+static __always_inline int
+arch_atomic_add_return_acquire(int i, atomic_t *v)
+{
+	int ret = arch_atomic_add_return_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic_add_return_acquire arch_atomic_add_return_acquire
+#endif
+
+#ifndef arch_atomic_add_return_release
+static __always_inline int
+arch_atomic_add_return_release(int i, atomic_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic_add_return_relaxed(i, v);
+}
+#define arch_atomic_add_return_release arch_atomic_add_return_release
+#endif
+
+#ifndef arch_atomic_add_return
+static __always_inline int
+arch_atomic_add_return(int i, atomic_t *v)
+{
+	int ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic_add_return_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic_add_return arch_atomic_add_return
+#endif
+
+#endif /* arch_atomic_add_return_relaxed */
+
+#ifndef arch_atomic_fetch_add_relaxed
+#define arch_atomic_fetch_add_acquire arch_atomic_fetch_add
+#define arch_atomic_fetch_add_release arch_atomic_fetch_add
+#define arch_atomic_fetch_add_relaxed arch_atomic_fetch_add
+#else /* arch_atomic_fetch_add_relaxed */
+
+#ifndef arch_atomic_fetch_add_acquire
+static __always_inline int
+arch_atomic_fetch_add_acquire(int i, atomic_t *v)
+{
+	int ret = arch_atomic_fetch_add_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic_fetch_add_acquire arch_atomic_fetch_add_acquire
+#endif
+
+#ifndef arch_atomic_fetch_add_release
+static __always_inline int
+arch_atomic_fetch_add_release(int i, atomic_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic_fetch_add_relaxed(i, v);
+}
+#define arch_atomic_fetch_add_release arch_atomic_fetch_add_release
+#endif
+
+#ifndef arch_atomic_fetch_add
+static __always_inline int
+arch_atomic_fetch_add(int i, atomic_t *v)
+{
+	int ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic_fetch_add_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic_fetch_add arch_atomic_fetch_add
+#endif
+
+#endif /* arch_atomic_fetch_add_relaxed */
+
+#ifndef arch_atomic_sub_return_relaxed
+#define arch_atomic_sub_return_acquire arch_atomic_sub_return
+#define arch_atomic_sub_return_release arch_atomic_sub_return
+#define arch_atomic_sub_return_relaxed arch_atomic_sub_return
+#else /* arch_atomic_sub_return_relaxed */
+
+#ifndef arch_atomic_sub_return_acquire
+static __always_inline int
+arch_atomic_sub_return_acquire(int i, atomic_t *v)
+{
+	int ret = arch_atomic_sub_return_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic_sub_return_acquire arch_atomic_sub_return_acquire
+#endif
+
+#ifndef arch_atomic_sub_return_release
+static __always_inline int
+arch_atomic_sub_return_release(int i, atomic_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic_sub_return_relaxed(i, v);
+}
+#define arch_atomic_sub_return_release arch_atomic_sub_return_release
+#endif
+
+#ifndef arch_atomic_sub_return
+static __always_inline int
+arch_atomic_sub_return(int i, atomic_t *v)
+{
+	int ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic_sub_return_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic_sub_return arch_atomic_sub_return
+#endif
+
+#endif /* arch_atomic_sub_return_relaxed */
+
+#ifndef arch_atomic_fetch_sub_relaxed
+#define arch_atomic_fetch_sub_acquire arch_atomic_fetch_sub
+#define arch_atomic_fetch_sub_release arch_atomic_fetch_sub
+#define arch_atomic_fetch_sub_relaxed arch_atomic_fetch_sub
+#else /* arch_atomic_fetch_sub_relaxed */
+
+#ifndef arch_atomic_fetch_sub_acquire
+static __always_inline int
+arch_atomic_fetch_sub_acquire(int i, atomic_t *v)
+{
+	int ret = arch_atomic_fetch_sub_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic_fetch_sub_acquire arch_atomic_fetch_sub_acquire
+#endif
+
+#ifndef arch_atomic_fetch_sub_release
+static __always_inline int
+arch_atomic_fetch_sub_release(int i, atomic_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic_fetch_sub_relaxed(i, v);
+}
+#define arch_atomic_fetch_sub_release arch_atomic_fetch_sub_release
+#endif
+
+#ifndef arch_atomic_fetch_sub
+static __always_inline int
+arch_atomic_fetch_sub(int i, atomic_t *v)
+{
+	int ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic_fetch_sub_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic_fetch_sub arch_atomic_fetch_sub
+#endif
+
+#endif /* arch_atomic_fetch_sub_relaxed */
+
+#ifndef arch_atomic_inc
+static __always_inline void
+arch_atomic_inc(atomic_t *v)
+{
+	arch_atomic_add(1, v);
+}
+#define arch_atomic_inc arch_atomic_inc
+#endif
+
+#ifndef arch_atomic_inc_return_relaxed
+#ifdef arch_atomic_inc_return
+#define arch_atomic_inc_return_acquire arch_atomic_inc_return
+#define arch_atomic_inc_return_release arch_atomic_inc_return
+#define arch_atomic_inc_return_relaxed arch_atomic_inc_return
+#endif /* arch_atomic_inc_return */
+
+#ifndef arch_atomic_inc_return
+static __always_inline int
+arch_atomic_inc_return(atomic_t *v)
+{
+	return arch_atomic_add_return(1, v);
+}
+#define arch_atomic_inc_return arch_atomic_inc_return
+#endif
+
+#ifndef arch_atomic_inc_return_acquire
+static __always_inline int
+arch_atomic_inc_return_acquire(atomic_t *v)
+{
+	return arch_atomic_add_return_acquire(1, v);
+}
+#define arch_atomic_inc_return_acquire arch_atomic_inc_return_acquire
+#endif
+
+#ifndef arch_atomic_inc_return_release
+static __always_inline int
+arch_atomic_inc_return_release(atomic_t *v)
+{
+	return arch_atomic_add_return_release(1, v);
+}
+#define arch_atomic_inc_return_release arch_atomic_inc_return_release
+#endif
+
+#ifndef arch_atomic_inc_return_relaxed
+static __always_inline int
+arch_atomic_inc_return_relaxed(atomic_t *v)
+{
+	return arch_atomic_add_return_relaxed(1, v);
+}
+#define arch_atomic_inc_return_relaxed arch_atomic_inc_return_relaxed
+#endif
+
+#else /* arch_atomic_inc_return_relaxed */
+
+#ifndef arch_atomic_inc_return_acquire
+static __always_inline int
+arch_atomic_inc_return_acquire(atomic_t *v)
+{
+	int ret = arch_atomic_inc_return_relaxed(v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic_inc_return_acquire arch_atomic_inc_return_acquire
+#endif
+
+#ifndef arch_atomic_inc_return_release
+static __always_inline int
+arch_atomic_inc_return_release(atomic_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic_inc_return_relaxed(v);
+}
+#define arch_atomic_inc_return_release arch_atomic_inc_return_release
+#endif
+
+#ifndef arch_atomic_inc_return
+static __always_inline int
+arch_atomic_inc_return(atomic_t *v)
+{
+	int ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic_inc_return_relaxed(v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic_inc_return arch_atomic_inc_return
+#endif
+
+#endif /* arch_atomic_inc_return_relaxed */
+
+#ifndef arch_atomic_fetch_inc_relaxed
+#ifdef arch_atomic_fetch_inc
+#define arch_atomic_fetch_inc_acquire arch_atomic_fetch_inc
+#define arch_atomic_fetch_inc_release arch_atomic_fetch_inc
+#define arch_atomic_fetch_inc_relaxed arch_atomic_fetch_inc
+#endif /* arch_atomic_fetch_inc */
+
+#ifndef arch_atomic_fetch_inc
+static __always_inline int
+arch_atomic_fetch_inc(atomic_t *v)
+{
+	return arch_atomic_fetch_add(1, v);
+}
+#define arch_atomic_fetch_inc arch_atomic_fetch_inc
+#endif
+
+#ifndef arch_atomic_fetch_inc_acquire
+static __always_inline int
+arch_atomic_fetch_inc_acquire(atomic_t *v)
+{
+	return arch_atomic_fetch_add_acquire(1, v);
+}
+#define arch_atomic_fetch_inc_acquire arch_atomic_fetch_inc_acquire
+#endif
+
+#ifndef arch_atomic_fetch_inc_release
+static __always_inline int
+arch_atomic_fetch_inc_release(atomic_t *v)
+{
+	return arch_atomic_fetch_add_release(1, v);
+}
+#define arch_atomic_fetch_inc_release arch_atomic_fetch_inc_release
+#endif
+
+#ifndef arch_atomic_fetch_inc_relaxed
+static __always_inline int
+arch_atomic_fetch_inc_relaxed(atomic_t *v)
+{
+	return arch_atomic_fetch_add_relaxed(1, v);
+}
+#define arch_atomic_fetch_inc_relaxed arch_atomic_fetch_inc_relaxed
+#endif
+
+#else /* arch_atomic_fetch_inc_relaxed */
+
+#ifndef arch_atomic_fetch_inc_acquire
+static __always_inline int
+arch_atomic_fetch_inc_acquire(atomic_t *v)
+{
+	int ret = arch_atomic_fetch_inc_relaxed(v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic_fetch_inc_acquire arch_atomic_fetch_inc_acquire
+#endif
+
+#ifndef arch_atomic_fetch_inc_release
+static __always_inline int
+arch_atomic_fetch_inc_release(atomic_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic_fetch_inc_relaxed(v);
+}
+#define arch_atomic_fetch_inc_release arch_atomic_fetch_inc_release
+#endif
+
+#ifndef arch_atomic_fetch_inc
+static __always_inline int
+arch_atomic_fetch_inc(atomic_t *v)
+{
+	int ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic_fetch_inc_relaxed(v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic_fetch_inc arch_atomic_fetch_inc
+#endif
+
+#endif /* arch_atomic_fetch_inc_relaxed */
+
+#ifndef arch_atomic_dec
+static __always_inline void
+arch_atomic_dec(atomic_t *v)
+{
+	arch_atomic_sub(1, v);
+}
+#define arch_atomic_dec arch_atomic_dec
+#endif
+
+#ifndef arch_atomic_dec_return_relaxed
+#ifdef arch_atomic_dec_return
+#define arch_atomic_dec_return_acquire arch_atomic_dec_return
+#define arch_atomic_dec_return_release arch_atomic_dec_return
+#define arch_atomic_dec_return_relaxed arch_atomic_dec_return
+#endif /* arch_atomic_dec_return */
+
+#ifndef arch_atomic_dec_return
+static __always_inline int
+arch_atomic_dec_return(atomic_t *v)
+{
+	return arch_atomic_sub_return(1, v);
+}
+#define arch_atomic_dec_return arch_atomic_dec_return
+#endif
+
+#ifndef arch_atomic_dec_return_acquire
+static __always_inline int
+arch_atomic_dec_return_acquire(atomic_t *v)
+{
+	return arch_atomic_sub_return_acquire(1, v);
+}
+#define arch_atomic_dec_return_acquire arch_atomic_dec_return_acquire
+#endif
+
+#ifndef arch_atomic_dec_return_release
+static __always_inline int
+arch_atomic_dec_return_release(atomic_t *v)
+{
+	return arch_atomic_sub_return_release(1, v);
+}
+#define arch_atomic_dec_return_release arch_atomic_dec_return_release
+#endif
+
+#ifndef arch_atomic_dec_return_relaxed
+static __always_inline int
+arch_atomic_dec_return_relaxed(atomic_t *v)
+{
+	return arch_atomic_sub_return_relaxed(1, v);
+}
+#define arch_atomic_dec_return_relaxed arch_atomic_dec_return_relaxed
+#endif
+
+#else /* arch_atomic_dec_return_relaxed */
+
+#ifndef arch_atomic_dec_return_acquire
+static __always_inline int
+arch_atomic_dec_return_acquire(atomic_t *v)
+{
+	int ret = arch_atomic_dec_return_relaxed(v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic_dec_return_acquire arch_atomic_dec_return_acquire
+#endif
+
+#ifndef arch_atomic_dec_return_release
+static __always_inline int
+arch_atomic_dec_return_release(atomic_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic_dec_return_relaxed(v);
+}
+#define arch_atomic_dec_return_release arch_atomic_dec_return_release
+#endif
+
+#ifndef arch_atomic_dec_return
+static __always_inline int
+arch_atomic_dec_return(atomic_t *v)
+{
+	int ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic_dec_return_relaxed(v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic_dec_return arch_atomic_dec_return
+#endif
+
+#endif /* arch_atomic_dec_return_relaxed */
+
+#ifndef arch_atomic_fetch_dec_relaxed
+#ifdef arch_atomic_fetch_dec
+#define arch_atomic_fetch_dec_acquire arch_atomic_fetch_dec
+#define arch_atomic_fetch_dec_release arch_atomic_fetch_dec
+#define arch_atomic_fetch_dec_relaxed arch_atomic_fetch_dec
+#endif /* arch_atomic_fetch_dec */
+
+#ifndef arch_atomic_fetch_dec
+static __always_inline int
+arch_atomic_fetch_dec(atomic_t *v)
+{
+	return arch_atomic_fetch_sub(1, v);
+}
+#define arch_atomic_fetch_dec arch_atomic_fetch_dec
+#endif
+
+#ifndef arch_atomic_fetch_dec_acquire
+static __always_inline int
+arch_atomic_fetch_dec_acquire(atomic_t *v)
+{
+	return arch_atomic_fetch_sub_acquire(1, v);
+}
+#define arch_atomic_fetch_dec_acquire arch_atomic_fetch_dec_acquire
+#endif
+
+#ifndef arch_atomic_fetch_dec_release
+static __always_inline int
+arch_atomic_fetch_dec_release(atomic_t *v)
+{
+	return arch_atomic_fetch_sub_release(1, v);
+}
+#define arch_atomic_fetch_dec_release arch_atomic_fetch_dec_release
+#endif
+
+#ifndef arch_atomic_fetch_dec_relaxed
+static __always_inline int
+arch_atomic_fetch_dec_relaxed(atomic_t *v)
+{
+	return arch_atomic_fetch_sub_relaxed(1, v);
+}
+#define arch_atomic_fetch_dec_relaxed arch_atomic_fetch_dec_relaxed
+#endif
+
+#else /* arch_atomic_fetch_dec_relaxed */
+
+#ifndef arch_atomic_fetch_dec_acquire
+static __always_inline int
+arch_atomic_fetch_dec_acquire(atomic_t *v)
+{
+	int ret = arch_atomic_fetch_dec_relaxed(v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic_fetch_dec_acquire arch_atomic_fetch_dec_acquire
+#endif
+
+#ifndef arch_atomic_fetch_dec_release
+static __always_inline int
+arch_atomic_fetch_dec_release(atomic_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic_fetch_dec_relaxed(v);
+}
+#define arch_atomic_fetch_dec_release arch_atomic_fetch_dec_release
+#endif
+
+#ifndef arch_atomic_fetch_dec
+static __always_inline int
+arch_atomic_fetch_dec(atomic_t *v)
+{
+	int ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic_fetch_dec_relaxed(v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic_fetch_dec arch_atomic_fetch_dec
+#endif
+
+#endif /* arch_atomic_fetch_dec_relaxed */
+
+#ifndef arch_atomic_fetch_and_relaxed
+#define arch_atomic_fetch_and_acquire arch_atomic_fetch_and
+#define arch_atomic_fetch_and_release arch_atomic_fetch_and
+#define arch_atomic_fetch_and_relaxed arch_atomic_fetch_and
+#else /* arch_atomic_fetch_and_relaxed */
+
+#ifndef arch_atomic_fetch_and_acquire
+static __always_inline int
+arch_atomic_fetch_and_acquire(int i, atomic_t *v)
+{
+	int ret = arch_atomic_fetch_and_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic_fetch_and_acquire arch_atomic_fetch_and_acquire
+#endif
+
+#ifndef arch_atomic_fetch_and_release
+static __always_inline int
+arch_atomic_fetch_and_release(int i, atomic_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic_fetch_and_relaxed(i, v);
+}
+#define arch_atomic_fetch_and_release arch_atomic_fetch_and_release
+#endif
+
+#ifndef arch_atomic_fetch_and
+static __always_inline int
+arch_atomic_fetch_and(int i, atomic_t *v)
+{
+	int ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic_fetch_and_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic_fetch_and arch_atomic_fetch_and
+#endif
+
+#endif /* arch_atomic_fetch_and_relaxed */
+
+#ifndef arch_atomic_andnot
+static __always_inline void
+arch_atomic_andnot(int i, atomic_t *v)
+{
+	arch_atomic_and(~i, v);
+}
+#define arch_atomic_andnot arch_atomic_andnot
+#endif
+
+#ifndef arch_atomic_fetch_andnot_relaxed
+#ifdef arch_atomic_fetch_andnot
+#define arch_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot
+#define arch_atomic_fetch_andnot_release arch_atomic_fetch_andnot
+#define arch_atomic_fetch_andnot_relaxed arch_atomic_fetch_andnot
+#endif /* arch_atomic_fetch_andnot */
+
+#ifndef arch_atomic_fetch_andnot
+static __always_inline int
+arch_atomic_fetch_andnot(int i, atomic_t *v)
+{
+	return arch_atomic_fetch_and(~i, v);
+}
+#define arch_atomic_fetch_andnot arch_atomic_fetch_andnot
+#endif
+
+#ifndef arch_atomic_fetch_andnot_acquire
+static __always_inline int
+arch_atomic_fetch_andnot_acquire(int i, atomic_t *v)
+{
+	return arch_atomic_fetch_and_acquire(~i, v);
+}
+#define arch_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire
+#endif
+
+#ifndef arch_atomic_fetch_andnot_release
+static __always_inline int
+arch_atomic_fetch_andnot_release(int i, atomic_t *v)
+{
+	return arch_atomic_fetch_and_release(~i, v);
+}
+#define arch_atomic_fetch_andnot_release arch_atomic_fetch_andnot_release
+#endif
+
+#ifndef arch_atomic_fetch_andnot_relaxed
+static __always_inline int
+arch_atomic_fetch_andnot_relaxed(int i, atomic_t *v)
+{
+	return arch_atomic_fetch_and_relaxed(~i, v);
+}
+#define arch_atomic_fetch_andnot_relaxed arch_atomic_fetch_andnot_relaxed
+#endif
+
+#else /* arch_atomic_fetch_andnot_relaxed */
+
+#ifndef arch_atomic_fetch_andnot_acquire
+static __always_inline int
+arch_atomic_fetch_andnot_acquire(int i, atomic_t *v)
+{
+	int ret = arch_atomic_fetch_andnot_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic_fetch_andnot_acquire arch_atomic_fetch_andnot_acquire
+#endif
+
+#ifndef arch_atomic_fetch_andnot_release
+static __always_inline int
+arch_atomic_fetch_andnot_release(int i, atomic_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic_fetch_andnot_relaxed(i, v);
+}
+#define arch_atomic_fetch_andnot_release arch_atomic_fetch_andnot_release
+#endif
+
+#ifndef arch_atomic_fetch_andnot
+static __always_inline int
+arch_atomic_fetch_andnot(int i, atomic_t *v)
+{
+	int ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic_fetch_andnot_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic_fetch_andnot arch_atomic_fetch_andnot
+#endif
+
+#endif /* arch_atomic_fetch_andnot_relaxed */
+
+#ifndef arch_atomic_fetch_or_relaxed
+#define arch_atomic_fetch_or_acquire arch_atomic_fetch_or
+#define arch_atomic_fetch_or_release arch_atomic_fetch_or
+#define arch_atomic_fetch_or_relaxed arch_atomic_fetch_or
+#else /* arch_atomic_fetch_or_relaxed */
+
+#ifndef arch_atomic_fetch_or_acquire
+static __always_inline int
+arch_atomic_fetch_or_acquire(int i, atomic_t *v)
+{
+	int ret = arch_atomic_fetch_or_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic_fetch_or_acquire arch_atomic_fetch_or_acquire
+#endif
+
+#ifndef arch_atomic_fetch_or_release
+static __always_inline int
+arch_atomic_fetch_or_release(int i, atomic_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic_fetch_or_relaxed(i, v);
+}
+#define arch_atomic_fetch_or_release arch_atomic_fetch_or_release
+#endif
+
+#ifndef arch_atomic_fetch_or
+static __always_inline int
+arch_atomic_fetch_or(int i, atomic_t *v)
+{
+	int ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic_fetch_or_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic_fetch_or arch_atomic_fetch_or
+#endif
+
+#endif /* arch_atomic_fetch_or_relaxed */
+
+#ifndef arch_atomic_fetch_xor_relaxed
+#define arch_atomic_fetch_xor_acquire arch_atomic_fetch_xor
+#define arch_atomic_fetch_xor_release arch_atomic_fetch_xor
+#define arch_atomic_fetch_xor_relaxed arch_atomic_fetch_xor
+#else /* arch_atomic_fetch_xor_relaxed */
+
+#ifndef arch_atomic_fetch_xor_acquire
+static __always_inline int
+arch_atomic_fetch_xor_acquire(int i, atomic_t *v)
+{
+	int ret = arch_atomic_fetch_xor_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic_fetch_xor_acquire arch_atomic_fetch_xor_acquire
+#endif
+
+#ifndef arch_atomic_fetch_xor_release
+static __always_inline int
+arch_atomic_fetch_xor_release(int i, atomic_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic_fetch_xor_relaxed(i, v);
+}
+#define arch_atomic_fetch_xor_release arch_atomic_fetch_xor_release
+#endif
+
+#ifndef arch_atomic_fetch_xor
+static __always_inline int
+arch_atomic_fetch_xor(int i, atomic_t *v)
+{
+	int ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic_fetch_xor_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic_fetch_xor arch_atomic_fetch_xor
+#endif
+
+#endif /* arch_atomic_fetch_xor_relaxed */
+
+#ifndef arch_atomic_xchg_relaxed
+#define arch_atomic_xchg_acquire arch_atomic_xchg
+#define arch_atomic_xchg_release arch_atomic_xchg
+#define arch_atomic_xchg_relaxed arch_atomic_xchg
+#else /* arch_atomic_xchg_relaxed */
+
+#ifndef arch_atomic_xchg_acquire
+static __always_inline int
+arch_atomic_xchg_acquire(atomic_t *v, int i)
+{
+	int ret = arch_atomic_xchg_relaxed(v, i);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic_xchg_acquire arch_atomic_xchg_acquire
+#endif
+
+#ifndef arch_atomic_xchg_release
+static __always_inline int
+arch_atomic_xchg_release(atomic_t *v, int i)
+{
+	__atomic_release_fence();
+	return arch_atomic_xchg_relaxed(v, i);
+}
+#define arch_atomic_xchg_release arch_atomic_xchg_release
+#endif
+
+#ifndef arch_atomic_xchg
+static __always_inline int
+arch_atomic_xchg(atomic_t *v, int i)
+{
+	int ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic_xchg_relaxed(v, i);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic_xchg arch_atomic_xchg
+#endif
+
+#endif /* arch_atomic_xchg_relaxed */
+
+#ifndef arch_atomic_cmpxchg_relaxed
+#define arch_atomic_cmpxchg_acquire arch_atomic_cmpxchg
+#define arch_atomic_cmpxchg_release arch_atomic_cmpxchg
+#define arch_atomic_cmpxchg_relaxed arch_atomic_cmpxchg
+#else /* arch_atomic_cmpxchg_relaxed */
+
+#ifndef arch_atomic_cmpxchg_acquire
+static __always_inline int
+arch_atomic_cmpxchg_acquire(atomic_t *v, int old, int new)
+{
+	int ret = arch_atomic_cmpxchg_relaxed(v, old, new);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic_cmpxchg_acquire arch_atomic_cmpxchg_acquire
+#endif
+
+#ifndef arch_atomic_cmpxchg_release
+static __always_inline int
+arch_atomic_cmpxchg_release(atomic_t *v, int old, int new)
+{
+	__atomic_release_fence();
+	return arch_atomic_cmpxchg_relaxed(v, old, new);
+}
+#define arch_atomic_cmpxchg_release arch_atomic_cmpxchg_release
+#endif
+
+#ifndef arch_atomic_cmpxchg
+static __always_inline int
+arch_atomic_cmpxchg(atomic_t *v, int old, int new)
+{
+	int ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic_cmpxchg_relaxed(v, old, new);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic_cmpxchg arch_atomic_cmpxchg
+#endif
+
+#endif /* arch_atomic_cmpxchg_relaxed */
+
+#ifndef arch_atomic_try_cmpxchg_relaxed
+#ifdef arch_atomic_try_cmpxchg
+#define arch_atomic_try_cmpxchg_acquire arch_atomic_try_cmpxchg
+#define arch_atomic_try_cmpxchg_release arch_atomic_try_cmpxchg
+#define arch_atomic_try_cmpxchg_relaxed arch_atomic_try_cmpxchg
+#endif /* arch_atomic_try_cmpxchg */
+
+#ifndef arch_atomic_try_cmpxchg
+static __always_inline bool
+arch_atomic_try_cmpxchg(atomic_t *v, int *old, int new)
+{
+	int r, o = *old;
+	r = arch_atomic_cmpxchg(v, o, new);
+	if (unlikely(r != o))
+		*old = r;
+	return likely(r == o);
+}
+#define arch_atomic_try_cmpxchg arch_atomic_try_cmpxchg
+#endif
+
+#ifndef arch_atomic_try_cmpxchg_acquire
+static __always_inline bool
+arch_atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
+{
+	int r, o = *old;
+	r = arch_atomic_cmpxchg_acquire(v, o, new);
+	if (unlikely(r != o))
+		*old = r;
+	return likely(r == o);
+}
+#define arch_atomic_try_cmpxchg_acquire arch_atomic_try_cmpxchg_acquire
+#endif
+
+#ifndef arch_atomic_try_cmpxchg_release
+static __always_inline bool
+arch_atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
+{
+	int r, o = *old;
+	r = arch_atomic_cmpxchg_release(v, o, new);
+	if (unlikely(r != o))
+		*old = r;
+	return likely(r == o);
+}
+#define arch_atomic_try_cmpxchg_release arch_atomic_try_cmpxchg_release
+#endif
+
+#ifndef arch_atomic_try_cmpxchg_relaxed
+static __always_inline bool
+arch_atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
+{
+	int r, o = *old;
+	r = arch_atomic_cmpxchg_relaxed(v, o, new);
+	if (unlikely(r != o))
+		*old = r;
+	return likely(r == o);
+}
+#define arch_atomic_try_cmpxchg_relaxed arch_atomic_try_cmpxchg_relaxed
+#endif
+
+#else /* arch_atomic_try_cmpxchg_relaxed */
+
+#ifndef arch_atomic_try_cmpxchg_acquire
+static __always_inline bool
+arch_atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
+{
+	bool ret = arch_atomic_try_cmpxchg_relaxed(v, old, new);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic_try_cmpxchg_acquire arch_atomic_try_cmpxchg_acquire
+#endif
+
+#ifndef arch_atomic_try_cmpxchg_release
+static __always_inline bool
+arch_atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
+{
+	__atomic_release_fence();
+	return arch_atomic_try_cmpxchg_relaxed(v, old, new);
+}
+#define arch_atomic_try_cmpxchg_release arch_atomic_try_cmpxchg_release
+#endif
+
+#ifndef arch_atomic_try_cmpxchg
+static __always_inline bool
+arch_atomic_try_cmpxchg(atomic_t *v, int *old, int new)
+{
+	bool ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic_try_cmpxchg_relaxed(v, old, new);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic_try_cmpxchg arch_atomic_try_cmpxchg
+#endif
+
+#endif /* arch_atomic_try_cmpxchg_relaxed */
+
+#ifndef arch_atomic_sub_and_test
+/**
+ * arch_atomic_sub_and_test - subtract value from variable and test result
+ * @i: integer value to subtract
+ * @v: pointer of type atomic_t
+ *
+ * Atomically subtracts @i from @v and returns
+ * true if the result is zero, or false for all
+ * other cases.
+ */
+static __always_inline bool
+arch_atomic_sub_and_test(int i, atomic_t *v)
+{
+	return arch_atomic_sub_return(i, v) == 0;
+}
+#define arch_atomic_sub_and_test arch_atomic_sub_and_test
+#endif
+
+#ifndef arch_atomic_dec_and_test
+/**
+ * arch_atomic_dec_and_test - decrement and test
+ * @v: pointer of type atomic_t
+ *
+ * Atomically decrements @v by 1 and
+ * returns true if the result is 0, or false for all other
+ * cases.
+ */
+static __always_inline bool
+arch_atomic_dec_and_test(atomic_t *v)
+{
+	return arch_atomic_dec_return(v) == 0;
+}
+#define arch_atomic_dec_and_test arch_atomic_dec_and_test
+#endif
+
+#ifndef arch_atomic_inc_and_test
+/**
+ * arch_atomic_inc_and_test - increment and test
+ * @v: pointer of type atomic_t
+ *
+ * Atomically increments @v by 1
+ * and returns true if the result is zero, or false for all
+ * other cases.
+ */
+static __always_inline bool
+arch_atomic_inc_and_test(atomic_t *v)
+{
+	return arch_atomic_inc_return(v) == 0;
+}
+#define arch_atomic_inc_and_test arch_atomic_inc_and_test
+#endif
+
+#ifndef arch_atomic_add_negative
+/**
+ * arch_atomic_add_negative - add and test if negative
+ * @i: integer value to add
+ * @v: pointer of type atomic_t
+ *
+ * Atomically adds @i to @v and returns true
+ * if the result is negative, or false when
+ * result is greater than or equal to zero.
+ */
+static __always_inline bool
+arch_atomic_add_negative(int i, atomic_t *v)
+{
+	return arch_atomic_add_return(i, v) < 0;
+}
+#define arch_atomic_add_negative arch_atomic_add_negative
+#endif
+
+#ifndef arch_atomic_fetch_add_unless
+/**
+ * arch_atomic_fetch_add_unless - add unless the number is already a given value
+ * @v: pointer of type atomic_t
+ * @a: the amount to add to v...
+ * @u: ...unless v is equal to u.
+ *
+ * Atomically adds @a to @v, so long as @v was not already @u.
+ * Returns original value of @v
+ */
+static __always_inline int
+arch_atomic_fetch_add_unless(atomic_t *v, int a, int u)
+{
+	int c = arch_atomic_read(v);
+
+	do {
+		if (unlikely(c == u))
+			break;
+	} while (!arch_atomic_try_cmpxchg(v, &c, c + a));
+
+	return c;
+}
+#define arch_atomic_fetch_add_unless arch_atomic_fetch_add_unless
+#endif
+
+#ifndef arch_atomic_add_unless
+/**
+ * arch_atomic_add_unless - add unless the number is already a given value
+ * @v: pointer of type atomic_t
+ * @a: the amount to add to v...
+ * @u: ...unless v is equal to u.
+ *
+ * Atomically adds @a to @v, if @v was not already @u.
+ * Returns true if the addition was done.
+ */
+static __always_inline bool
+arch_atomic_add_unless(atomic_t *v, int a, int u)
+{
+	return arch_atomic_fetch_add_unless(v, a, u) != u;
+}
+#define arch_atomic_add_unless arch_atomic_add_unless
+#endif
+
+#ifndef arch_atomic_inc_not_zero
+/**
+ * arch_atomic_inc_not_zero - increment unless the number is zero
+ * @v: pointer of type atomic_t
+ *
+ * Atomically increments @v by 1, if @v is non-zero.
+ * Returns true if the increment was done.
+ */
+static __always_inline bool
+arch_atomic_inc_not_zero(atomic_t *v)
+{
+	return arch_atomic_add_unless(v, 1, 0);
+}
+#define arch_atomic_inc_not_zero arch_atomic_inc_not_zero
+#endif
+
+#ifndef arch_atomic_inc_unless_negative
+static __always_inline bool
+arch_atomic_inc_unless_negative(atomic_t *v)
+{
+	int c = arch_atomic_read(v);
+
+	do {
+		if (unlikely(c < 0))
+			return false;
+	} while (!arch_atomic_try_cmpxchg(v, &c, c + 1));
+
+	return true;
+}
+#define arch_atomic_inc_unless_negative arch_atomic_inc_unless_negative
+#endif
+
+#ifndef arch_atomic_dec_unless_positive
+static __always_inline bool
+arch_atomic_dec_unless_positive(atomic_t *v)
+{
+	int c = arch_atomic_read(v);
+
+	do {
+		if (unlikely(c > 0))
+			return false;
+	} while (!arch_atomic_try_cmpxchg(v, &c, c - 1));
+
+	return true;
+}
+#define arch_atomic_dec_unless_positive arch_atomic_dec_unless_positive
+#endif
+
+#ifndef arch_atomic_dec_if_positive
+static __always_inline int
+arch_atomic_dec_if_positive(atomic_t *v)
+{
+	int dec, c = arch_atomic_read(v);
+
+	do {
+		dec = c - 1;
+		if (unlikely(dec < 0))
+			break;
+	} while (!arch_atomic_try_cmpxchg(v, &c, dec));
+
+	return dec;
+}
+#define arch_atomic_dec_if_positive arch_atomic_dec_if_positive
+#endif
+
+#ifdef CONFIG_GENERIC_ATOMIC64
+#include <asm-generic/atomic64.h>
+#endif
+
+#ifndef arch_atomic64_read_acquire
+static __always_inline s64
+arch_atomic64_read_acquire(const atomic64_t *v)
+{
+	return smp_load_acquire(&(v)->counter);
+}
+#define arch_atomic64_read_acquire arch_atomic64_read_acquire
+#endif
+
+#ifndef arch_atomic64_set_release
+static __always_inline void
+arch_atomic64_set_release(atomic64_t *v, s64 i)
+{
+	smp_store_release(&(v)->counter, i);
+}
+#define arch_atomic64_set_release arch_atomic64_set_release
+#endif
+
+#ifndef arch_atomic64_add_return_relaxed
+#define arch_atomic64_add_return_acquire arch_atomic64_add_return
+#define arch_atomic64_add_return_release arch_atomic64_add_return
+#define arch_atomic64_add_return_relaxed arch_atomic64_add_return
+#else /* arch_atomic64_add_return_relaxed */
+
+#ifndef arch_atomic64_add_return_acquire
+static __always_inline s64
+arch_atomic64_add_return_acquire(s64 i, atomic64_t *v)
+{
+	s64 ret = arch_atomic64_add_return_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic64_add_return_acquire arch_atomic64_add_return_acquire
+#endif
+
+#ifndef arch_atomic64_add_return_release
+static __always_inline s64
+arch_atomic64_add_return_release(s64 i, atomic64_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic64_add_return_relaxed(i, v);
+}
+#define arch_atomic64_add_return_release arch_atomic64_add_return_release
+#endif
+
+#ifndef arch_atomic64_add_return
+static __always_inline s64
+arch_atomic64_add_return(s64 i, atomic64_t *v)
+{
+	s64 ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic64_add_return_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic64_add_return arch_atomic64_add_return
+#endif
+
+#endif /* arch_atomic64_add_return_relaxed */
+
+#ifndef arch_atomic64_fetch_add_relaxed
+#define arch_atomic64_fetch_add_acquire arch_atomic64_fetch_add
+#define arch_atomic64_fetch_add_release arch_atomic64_fetch_add
+#define arch_atomic64_fetch_add_relaxed arch_atomic64_fetch_add
+#else /* arch_atomic64_fetch_add_relaxed */
+
+#ifndef arch_atomic64_fetch_add_acquire
+static __always_inline s64
+arch_atomic64_fetch_add_acquire(s64 i, atomic64_t *v)
+{
+	s64 ret = arch_atomic64_fetch_add_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_add_acquire arch_atomic64_fetch_add_acquire
+#endif
+
+#ifndef arch_atomic64_fetch_add_release
+static __always_inline s64
+arch_atomic64_fetch_add_release(s64 i, atomic64_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic64_fetch_add_relaxed(i, v);
+}
+#define arch_atomic64_fetch_add_release arch_atomic64_fetch_add_release
+#endif
+
+#ifndef arch_atomic64_fetch_add
+static __always_inline s64
+arch_atomic64_fetch_add(s64 i, atomic64_t *v)
+{
+	s64 ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic64_fetch_add_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_add arch_atomic64_fetch_add
+#endif
+
+#endif /* arch_atomic64_fetch_add_relaxed */
+
+#ifndef arch_atomic64_sub_return_relaxed
+#define arch_atomic64_sub_return_acquire arch_atomic64_sub_return
+#define arch_atomic64_sub_return_release arch_atomic64_sub_return
+#define arch_atomic64_sub_return_relaxed arch_atomic64_sub_return
+#else /* arch_atomic64_sub_return_relaxed */
+
+#ifndef arch_atomic64_sub_return_acquire
+static __always_inline s64
+arch_atomic64_sub_return_acquire(s64 i, atomic64_t *v)
+{
+	s64 ret = arch_atomic64_sub_return_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic64_sub_return_acquire arch_atomic64_sub_return_acquire
+#endif
+
+#ifndef arch_atomic64_sub_return_release
+static __always_inline s64
+arch_atomic64_sub_return_release(s64 i, atomic64_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic64_sub_return_relaxed(i, v);
+}
+#define arch_atomic64_sub_return_release arch_atomic64_sub_return_release
+#endif
+
+#ifndef arch_atomic64_sub_return
+static __always_inline s64
+arch_atomic64_sub_return(s64 i, atomic64_t *v)
+{
+	s64 ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic64_sub_return_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic64_sub_return arch_atomic64_sub_return
+#endif
+
+#endif /* arch_atomic64_sub_return_relaxed */
+
+#ifndef arch_atomic64_fetch_sub_relaxed
+#define arch_atomic64_fetch_sub_acquire arch_atomic64_fetch_sub
+#define arch_atomic64_fetch_sub_release arch_atomic64_fetch_sub
+#define arch_atomic64_fetch_sub_relaxed arch_atomic64_fetch_sub
+#else /* arch_atomic64_fetch_sub_relaxed */
+
+#ifndef arch_atomic64_fetch_sub_acquire
+static __always_inline s64
+arch_atomic64_fetch_sub_acquire(s64 i, atomic64_t *v)
+{
+	s64 ret = arch_atomic64_fetch_sub_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_sub_acquire arch_atomic64_fetch_sub_acquire
+#endif
+
+#ifndef arch_atomic64_fetch_sub_release
+static __always_inline s64
+arch_atomic64_fetch_sub_release(s64 i, atomic64_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic64_fetch_sub_relaxed(i, v);
+}
+#define arch_atomic64_fetch_sub_release arch_atomic64_fetch_sub_release
+#endif
+
+#ifndef arch_atomic64_fetch_sub
+static __always_inline s64
+arch_atomic64_fetch_sub(s64 i, atomic64_t *v)
+{
+	s64 ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic64_fetch_sub_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_sub arch_atomic64_fetch_sub
+#endif
+
+#endif /* arch_atomic64_fetch_sub_relaxed */
+
+#ifndef arch_atomic64_inc
+static __always_inline void
+arch_atomic64_inc(atomic64_t *v)
+{
+	arch_atomic64_add(1, v);
+}
+#define arch_atomic64_inc arch_atomic64_inc
+#endif
+
+#ifndef arch_atomic64_inc_return_relaxed
+#ifdef arch_atomic64_inc_return
+#define arch_atomic64_inc_return_acquire arch_atomic64_inc_return
+#define arch_atomic64_inc_return_release arch_atomic64_inc_return
+#define arch_atomic64_inc_return_relaxed arch_atomic64_inc_return
+#endif /* arch_atomic64_inc_return */
+
+#ifndef arch_atomic64_inc_return
+static __always_inline s64
+arch_atomic64_inc_return(atomic64_t *v)
+{
+	return arch_atomic64_add_return(1, v);
+}
+#define arch_atomic64_inc_return arch_atomic64_inc_return
+#endif
+
+#ifndef arch_atomic64_inc_return_acquire
+static __always_inline s64
+arch_atomic64_inc_return_acquire(atomic64_t *v)
+{
+	return arch_atomic64_add_return_acquire(1, v);
+}
+#define arch_atomic64_inc_return_acquire arch_atomic64_inc_return_acquire
+#endif
+
+#ifndef arch_atomic64_inc_return_release
+static __always_inline s64
+arch_atomic64_inc_return_release(atomic64_t *v)
+{
+	return arch_atomic64_add_return_release(1, v);
+}
+#define arch_atomic64_inc_return_release arch_atomic64_inc_return_release
+#endif
+
+#ifndef arch_atomic64_inc_return_relaxed
+static __always_inline s64
+arch_atomic64_inc_return_relaxed(atomic64_t *v)
+{
+	return arch_atomic64_add_return_relaxed(1, v);
+}
+#define arch_atomic64_inc_return_relaxed arch_atomic64_inc_return_relaxed
+#endif
+
+#else /* arch_atomic64_inc_return_relaxed */
+
+#ifndef arch_atomic64_inc_return_acquire
+static __always_inline s64
+arch_atomic64_inc_return_acquire(atomic64_t *v)
+{
+	s64 ret = arch_atomic64_inc_return_relaxed(v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic64_inc_return_acquire arch_atomic64_inc_return_acquire
+#endif
+
+#ifndef arch_atomic64_inc_return_release
+static __always_inline s64
+arch_atomic64_inc_return_release(atomic64_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic64_inc_return_relaxed(v);
+}
+#define arch_atomic64_inc_return_release arch_atomic64_inc_return_release
+#endif
+
+#ifndef arch_atomic64_inc_return
+static __always_inline s64
+arch_atomic64_inc_return(atomic64_t *v)
+{
+	s64 ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic64_inc_return_relaxed(v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic64_inc_return arch_atomic64_inc_return
+#endif
+
+#endif /* arch_atomic64_inc_return_relaxed */
+
+#ifndef arch_atomic64_fetch_inc_relaxed
+#ifdef arch_atomic64_fetch_inc
+#define arch_atomic64_fetch_inc_acquire arch_atomic64_fetch_inc
+#define arch_atomic64_fetch_inc_release arch_atomic64_fetch_inc
+#define arch_atomic64_fetch_inc_relaxed arch_atomic64_fetch_inc
+#endif /* arch_atomic64_fetch_inc */
+
+#ifndef arch_atomic64_fetch_inc
+static __always_inline s64
+arch_atomic64_fetch_inc(atomic64_t *v)
+{
+	return arch_atomic64_fetch_add(1, v);
+}
+#define arch_atomic64_fetch_inc arch_atomic64_fetch_inc
+#endif
+
+#ifndef arch_atomic64_fetch_inc_acquire
+static __always_inline s64
+arch_atomic64_fetch_inc_acquire(atomic64_t *v)
+{
+	return arch_atomic64_fetch_add_acquire(1, v);
+}
+#define arch_atomic64_fetch_inc_acquire arch_atomic64_fetch_inc_acquire
+#endif
+
+#ifndef arch_atomic64_fetch_inc_release
+static __always_inline s64
+arch_atomic64_fetch_inc_release(atomic64_t *v)
+{
+	return arch_atomic64_fetch_add_release(1, v);
+}
+#define arch_atomic64_fetch_inc_release arch_atomic64_fetch_inc_release
+#endif
+
+#ifndef arch_atomic64_fetch_inc_relaxed
+static __always_inline s64
+arch_atomic64_fetch_inc_relaxed(atomic64_t *v)
+{
+	return arch_atomic64_fetch_add_relaxed(1, v);
+}
+#define arch_atomic64_fetch_inc_relaxed arch_atomic64_fetch_inc_relaxed
+#endif
+
+#else /* arch_atomic64_fetch_inc_relaxed */
+
+#ifndef arch_atomic64_fetch_inc_acquire
+static __always_inline s64
+arch_atomic64_fetch_inc_acquire(atomic64_t *v)
+{
+	s64 ret = arch_atomic64_fetch_inc_relaxed(v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_inc_acquire arch_atomic64_fetch_inc_acquire
+#endif
+
+#ifndef arch_atomic64_fetch_inc_release
+static __always_inline s64
+arch_atomic64_fetch_inc_release(atomic64_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic64_fetch_inc_relaxed(v);
+}
+#define arch_atomic64_fetch_inc_release arch_atomic64_fetch_inc_release
+#endif
+
+#ifndef arch_atomic64_fetch_inc
+static __always_inline s64
+arch_atomic64_fetch_inc(atomic64_t *v)
+{
+	s64 ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic64_fetch_inc_relaxed(v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_inc arch_atomic64_fetch_inc
+#endif
+
+#endif /* arch_atomic64_fetch_inc_relaxed */
+
+#ifndef arch_atomic64_dec
+static __always_inline void
+arch_atomic64_dec(atomic64_t *v)
+{
+	arch_atomic64_sub(1, v);
+}
+#define arch_atomic64_dec arch_atomic64_dec
+#endif
+
+#ifndef arch_atomic64_dec_return_relaxed
+#ifdef arch_atomic64_dec_return
+#define arch_atomic64_dec_return_acquire arch_atomic64_dec_return
+#define arch_atomic64_dec_return_release arch_atomic64_dec_return
+#define arch_atomic64_dec_return_relaxed arch_atomic64_dec_return
+#endif /* arch_atomic64_dec_return */
+
+#ifndef arch_atomic64_dec_return
+static __always_inline s64
+arch_atomic64_dec_return(atomic64_t *v)
+{
+	return arch_atomic64_sub_return(1, v);
+}
+#define arch_atomic64_dec_return arch_atomic64_dec_return
+#endif
+
+#ifndef arch_atomic64_dec_return_acquire
+static __always_inline s64
+arch_atomic64_dec_return_acquire(atomic64_t *v)
+{
+	return arch_atomic64_sub_return_acquire(1, v);
+}
+#define arch_atomic64_dec_return_acquire arch_atomic64_dec_return_acquire
+#endif
+
+#ifndef arch_atomic64_dec_return_release
+static __always_inline s64
+arch_atomic64_dec_return_release(atomic64_t *v)
+{
+	return arch_atomic64_sub_return_release(1, v);
+}
+#define arch_atomic64_dec_return_release arch_atomic64_dec_return_release
+#endif
+
+#ifndef arch_atomic64_dec_return_relaxed
+static __always_inline s64
+arch_atomic64_dec_return_relaxed(atomic64_t *v)
+{
+	return arch_atomic64_sub_return_relaxed(1, v);
+}
+#define arch_atomic64_dec_return_relaxed arch_atomic64_dec_return_relaxed
+#endif
+
+#else /* arch_atomic64_dec_return_relaxed */
+
+#ifndef arch_atomic64_dec_return_acquire
+static __always_inline s64
+arch_atomic64_dec_return_acquire(atomic64_t *v)
+{
+	s64 ret = arch_atomic64_dec_return_relaxed(v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic64_dec_return_acquire arch_atomic64_dec_return_acquire
+#endif
+
+#ifndef arch_atomic64_dec_return_release
+static __always_inline s64
+arch_atomic64_dec_return_release(atomic64_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic64_dec_return_relaxed(v);
+}
+#define arch_atomic64_dec_return_release arch_atomic64_dec_return_release
+#endif
+
+#ifndef arch_atomic64_dec_return
+static __always_inline s64
+arch_atomic64_dec_return(atomic64_t *v)
+{
+	s64 ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic64_dec_return_relaxed(v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic64_dec_return arch_atomic64_dec_return
+#endif
+
+#endif /* arch_atomic64_dec_return_relaxed */
+
+#ifndef arch_atomic64_fetch_dec_relaxed
+#ifdef arch_atomic64_fetch_dec
+#define arch_atomic64_fetch_dec_acquire arch_atomic64_fetch_dec
+#define arch_atomic64_fetch_dec_release arch_atomic64_fetch_dec
+#define arch_atomic64_fetch_dec_relaxed arch_atomic64_fetch_dec
+#endif /* arch_atomic64_fetch_dec */
+
+#ifndef arch_atomic64_fetch_dec
+static __always_inline s64
+arch_atomic64_fetch_dec(atomic64_t *v)
+{
+	return arch_atomic64_fetch_sub(1, v);
+}
+#define arch_atomic64_fetch_dec arch_atomic64_fetch_dec
+#endif
+
+#ifndef arch_atomic64_fetch_dec_acquire
+static __always_inline s64
+arch_atomic64_fetch_dec_acquire(atomic64_t *v)
+{
+	return arch_atomic64_fetch_sub_acquire(1, v);
+}
+#define arch_atomic64_fetch_dec_acquire arch_atomic64_fetch_dec_acquire
+#endif
+
+#ifndef arch_atomic64_fetch_dec_release
+static __always_inline s64
+arch_atomic64_fetch_dec_release(atomic64_t *v)
+{
+	return arch_atomic64_fetch_sub_release(1, v);
+}
+#define arch_atomic64_fetch_dec_release arch_atomic64_fetch_dec_release
+#endif
+
+#ifndef arch_atomic64_fetch_dec_relaxed
+static __always_inline s64
+arch_atomic64_fetch_dec_relaxed(atomic64_t *v)
+{
+	return arch_atomic64_fetch_sub_relaxed(1, v);
+}
+#define arch_atomic64_fetch_dec_relaxed arch_atomic64_fetch_dec_relaxed
+#endif
+
+#else /* arch_atomic64_fetch_dec_relaxed */
+
+#ifndef arch_atomic64_fetch_dec_acquire
+static __always_inline s64
+arch_atomic64_fetch_dec_acquire(atomic64_t *v)
+{
+	s64 ret = arch_atomic64_fetch_dec_relaxed(v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_dec_acquire arch_atomic64_fetch_dec_acquire
+#endif
+
+#ifndef arch_atomic64_fetch_dec_release
+static __always_inline s64
+arch_atomic64_fetch_dec_release(atomic64_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic64_fetch_dec_relaxed(v);
+}
+#define arch_atomic64_fetch_dec_release arch_atomic64_fetch_dec_release
+#endif
+
+#ifndef arch_atomic64_fetch_dec
+static __always_inline s64
+arch_atomic64_fetch_dec(atomic64_t *v)
+{
+	s64 ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic64_fetch_dec_relaxed(v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_dec arch_atomic64_fetch_dec
+#endif
+
+#endif /* arch_atomic64_fetch_dec_relaxed */
+
+#ifndef arch_atomic64_fetch_and_relaxed
+#define arch_atomic64_fetch_and_acquire arch_atomic64_fetch_and
+#define arch_atomic64_fetch_and_release arch_atomic64_fetch_and
+#define arch_atomic64_fetch_and_relaxed arch_atomic64_fetch_and
+#else /* arch_atomic64_fetch_and_relaxed */
+
+#ifndef arch_atomic64_fetch_and_acquire
+static __always_inline s64
+arch_atomic64_fetch_and_acquire(s64 i, atomic64_t *v)
+{
+	s64 ret = arch_atomic64_fetch_and_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_and_acquire arch_atomic64_fetch_and_acquire
+#endif
+
+#ifndef arch_atomic64_fetch_and_release
+static __always_inline s64
+arch_atomic64_fetch_and_release(s64 i, atomic64_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic64_fetch_and_relaxed(i, v);
+}
+#define arch_atomic64_fetch_and_release arch_atomic64_fetch_and_release
+#endif
+
+#ifndef arch_atomic64_fetch_and
+static __always_inline s64
+arch_atomic64_fetch_and(s64 i, atomic64_t *v)
+{
+	s64 ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic64_fetch_and_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_and arch_atomic64_fetch_and
+#endif
+
+#endif /* arch_atomic64_fetch_and_relaxed */
+
+#ifndef arch_atomic64_andnot
+static __always_inline void
+arch_atomic64_andnot(s64 i, atomic64_t *v)
+{
+	arch_atomic64_and(~i, v);
+}
+#define arch_atomic64_andnot arch_atomic64_andnot
+#endif
+
+#ifndef arch_atomic64_fetch_andnot_relaxed
+#ifdef arch_atomic64_fetch_andnot
+#define arch_atomic64_fetch_andnot_acquire arch_atomic64_fetch_andnot
+#define arch_atomic64_fetch_andnot_release arch_atomic64_fetch_andnot
+#define arch_atomic64_fetch_andnot_relaxed arch_atomic64_fetch_andnot
+#endif /* arch_atomic64_fetch_andnot */
+
+#ifndef arch_atomic64_fetch_andnot
+static __always_inline s64
+arch_atomic64_fetch_andnot(s64 i, atomic64_t *v)
+{
+	return arch_atomic64_fetch_and(~i, v);
+}
+#define arch_atomic64_fetch_andnot arch_atomic64_fetch_andnot
+#endif
+
+#ifndef arch_atomic64_fetch_andnot_acquire
+static __always_inline s64
+arch_atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
+{
+	return arch_atomic64_fetch_and_acquire(~i, v);
+}
+#define arch_atomic64_fetch_andnot_acquire arch_atomic64_fetch_andnot_acquire
+#endif
+
+#ifndef arch_atomic64_fetch_andnot_release
+static __always_inline s64
+arch_atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
+{
+	return arch_atomic64_fetch_and_release(~i, v);
+}
+#define arch_atomic64_fetch_andnot_release arch_atomic64_fetch_andnot_release
+#endif
+
+#ifndef arch_atomic64_fetch_andnot_relaxed
+static __always_inline s64
+arch_atomic64_fetch_andnot_relaxed(s64 i, atomic64_t *v)
+{
+	return arch_atomic64_fetch_and_relaxed(~i, v);
+}
+#define arch_atomic64_fetch_andnot_relaxed arch_atomic64_fetch_andnot_relaxed
+#endif
+
+#else /* arch_atomic64_fetch_andnot_relaxed */
+
+#ifndef arch_atomic64_fetch_andnot_acquire
+static __always_inline s64
+arch_atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
+{
+	s64 ret = arch_atomic64_fetch_andnot_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_andnot_acquire arch_atomic64_fetch_andnot_acquire
+#endif
+
+#ifndef arch_atomic64_fetch_andnot_release
+static __always_inline s64
+arch_atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic64_fetch_andnot_relaxed(i, v);
+}
+#define arch_atomic64_fetch_andnot_release arch_atomic64_fetch_andnot_release
+#endif
+
+#ifndef arch_atomic64_fetch_andnot
+static __always_inline s64
+arch_atomic64_fetch_andnot(s64 i, atomic64_t *v)
+{
+	s64 ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic64_fetch_andnot_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_andnot arch_atomic64_fetch_andnot
+#endif
+
+#endif /* arch_atomic64_fetch_andnot_relaxed */
+
+#ifndef arch_atomic64_fetch_or_relaxed
+#define arch_atomic64_fetch_or_acquire arch_atomic64_fetch_or
+#define arch_atomic64_fetch_or_release arch_atomic64_fetch_or
+#define arch_atomic64_fetch_or_relaxed arch_atomic64_fetch_or
+#else /* arch_atomic64_fetch_or_relaxed */
+
+#ifndef arch_atomic64_fetch_or_acquire
+static __always_inline s64
+arch_atomic64_fetch_or_acquire(s64 i, atomic64_t *v)
+{
+	s64 ret = arch_atomic64_fetch_or_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_or_acquire arch_atomic64_fetch_or_acquire
+#endif
+
+#ifndef arch_atomic64_fetch_or_release
+static __always_inline s64
+arch_atomic64_fetch_or_release(s64 i, atomic64_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic64_fetch_or_relaxed(i, v);
+}
+#define arch_atomic64_fetch_or_release arch_atomic64_fetch_or_release
+#endif
+
+#ifndef arch_atomic64_fetch_or
+static __always_inline s64
+arch_atomic64_fetch_or(s64 i, atomic64_t *v)
+{
+	s64 ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic64_fetch_or_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_or arch_atomic64_fetch_or
+#endif
+
+#endif /* arch_atomic64_fetch_or_relaxed */
+
+#ifndef arch_atomic64_fetch_xor_relaxed
+#define arch_atomic64_fetch_xor_acquire arch_atomic64_fetch_xor
+#define arch_atomic64_fetch_xor_release arch_atomic64_fetch_xor
+#define arch_atomic64_fetch_xor_relaxed arch_atomic64_fetch_xor
+#else /* arch_atomic64_fetch_xor_relaxed */
+
+#ifndef arch_atomic64_fetch_xor_acquire
+static __always_inline s64
+arch_atomic64_fetch_xor_acquire(s64 i, atomic64_t *v)
+{
+	s64 ret = arch_atomic64_fetch_xor_relaxed(i, v);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_xor_acquire arch_atomic64_fetch_xor_acquire
+#endif
+
+#ifndef arch_atomic64_fetch_xor_release
+static __always_inline s64
+arch_atomic64_fetch_xor_release(s64 i, atomic64_t *v)
+{
+	__atomic_release_fence();
+	return arch_atomic64_fetch_xor_relaxed(i, v);
+}
+#define arch_atomic64_fetch_xor_release arch_atomic64_fetch_xor_release
+#endif
+
+#ifndef arch_atomic64_fetch_xor
+static __always_inline s64
+arch_atomic64_fetch_xor(s64 i, atomic64_t *v)
+{
+	s64 ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic64_fetch_xor_relaxed(i, v);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor
+#endif
+
+#endif /* arch_atomic64_fetch_xor_relaxed */
+
+#ifndef arch_atomic64_xchg_relaxed
+#define arch_atomic64_xchg_acquire arch_atomic64_xchg
+#define arch_atomic64_xchg_release arch_atomic64_xchg
+#define arch_atomic64_xchg_relaxed arch_atomic64_xchg
+#else /* arch_atomic64_xchg_relaxed */
+
+#ifndef arch_atomic64_xchg_acquire
+static __always_inline s64
+arch_atomic64_xchg_acquire(atomic64_t *v, s64 i)
+{
+	s64 ret = arch_atomic64_xchg_relaxed(v, i);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic64_xchg_acquire arch_atomic64_xchg_acquire
+#endif
+
+#ifndef arch_atomic64_xchg_release
+static __always_inline s64
+arch_atomic64_xchg_release(atomic64_t *v, s64 i)
+{
+	__atomic_release_fence();
+	return arch_atomic64_xchg_relaxed(v, i);
+}
+#define arch_atomic64_xchg_release arch_atomic64_xchg_release
+#endif
+
+#ifndef arch_atomic64_xchg
+static __always_inline s64
+arch_atomic64_xchg(atomic64_t *v, s64 i)
+{
+	s64 ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic64_xchg_relaxed(v, i);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic64_xchg arch_atomic64_xchg
+#endif
+
+#endif /* arch_atomic64_xchg_relaxed */
+
+#ifndef arch_atomic64_cmpxchg_relaxed
+#define arch_atomic64_cmpxchg_acquire arch_atomic64_cmpxchg
+#define arch_atomic64_cmpxchg_release arch_atomic64_cmpxchg
+#define arch_atomic64_cmpxchg_relaxed arch_atomic64_cmpxchg
+#else /* arch_atomic64_cmpxchg_relaxed */
+
+#ifndef arch_atomic64_cmpxchg_acquire
+static __always_inline s64
+arch_atomic64_cmpxchg_acquire(atomic64_t *v, s64 old, s64 new)
+{
+	s64 ret = arch_atomic64_cmpxchg_relaxed(v, old, new);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic64_cmpxchg_acquire arch_atomic64_cmpxchg_acquire
+#endif
+
+#ifndef arch_atomic64_cmpxchg_release
+static __always_inline s64
+arch_atomic64_cmpxchg_release(atomic64_t *v, s64 old, s64 new)
+{
+	__atomic_release_fence();
+	return arch_atomic64_cmpxchg_relaxed(v, old, new);
+}
+#define arch_atomic64_cmpxchg_release arch_atomic64_cmpxchg_release
+#endif
+
+#ifndef arch_atomic64_cmpxchg
+static __always_inline s64
+arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
+{
+	s64 ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic64_cmpxchg_relaxed(v, old, new);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic64_cmpxchg arch_atomic64_cmpxchg
+#endif
+
+#endif /* arch_atomic64_cmpxchg_relaxed */
+
+#ifndef arch_atomic64_try_cmpxchg_relaxed
+#ifdef arch_atomic64_try_cmpxchg
+#define arch_atomic64_try_cmpxchg_acquire arch_atomic64_try_cmpxchg
+#define arch_atomic64_try_cmpxchg_release arch_atomic64_try_cmpxchg
+#define arch_atomic64_try_cmpxchg_relaxed arch_atomic64_try_cmpxchg
+#endif /* arch_atomic64_try_cmpxchg */
+
+#ifndef arch_atomic64_try_cmpxchg
+static __always_inline bool
+arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
+{
+	s64 r, o = *old;
+	r = arch_atomic64_cmpxchg(v, o, new);
+	if (unlikely(r != o))
+		*old = r;
+	return likely(r == o);
+}
+#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg
+#endif
+
+#ifndef arch_atomic64_try_cmpxchg_acquire
+static __always_inline bool
+arch_atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
+{
+	s64 r, o = *old;
+	r = arch_atomic64_cmpxchg_acquire(v, o, new);
+	if (unlikely(r != o))
+		*old = r;
+	return likely(r == o);
+}
+#define arch_atomic64_try_cmpxchg_acquire arch_atomic64_try_cmpxchg_acquire
+#endif
+
+#ifndef arch_atomic64_try_cmpxchg_release
+static __always_inline bool
+arch_atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
+{
+	s64 r, o = *old;
+	r = arch_atomic64_cmpxchg_release(v, o, new);
+	if (unlikely(r != o))
+		*old = r;
+	return likely(r == o);
+}
+#define arch_atomic64_try_cmpxchg_release arch_atomic64_try_cmpxchg_release
+#endif
+
+#ifndef arch_atomic64_try_cmpxchg_relaxed
+static __always_inline bool
+arch_atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
+{
+	s64 r, o = *old;
+	r = arch_atomic64_cmpxchg_relaxed(v, o, new);
+	if (unlikely(r != o))
+		*old = r;
+	return likely(r == o);
+}
+#define arch_atomic64_try_cmpxchg_relaxed arch_atomic64_try_cmpxchg_relaxed
+#endif
+
+#else /* arch_atomic64_try_cmpxchg_relaxed */
+
+#ifndef arch_atomic64_try_cmpxchg_acquire
+static __always_inline bool
+arch_atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
+{
+	bool ret = arch_atomic64_try_cmpxchg_relaxed(v, old, new);
+	__atomic_acquire_fence();
+	return ret;
+}
+#define arch_atomic64_try_cmpxchg_acquire arch_atomic64_try_cmpxchg_acquire
+#endif
+
+#ifndef arch_atomic64_try_cmpxchg_release
+static __always_inline bool
+arch_atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
+{
+	__atomic_release_fence();
+	return arch_atomic64_try_cmpxchg_relaxed(v, old, new);
+}
+#define arch_atomic64_try_cmpxchg_release arch_atomic64_try_cmpxchg_release
+#endif
+
+#ifndef arch_atomic64_try_cmpxchg
+static __always_inline bool
+arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
+{
+	bool ret;
+	__atomic_pre_full_fence();
+	ret = arch_atomic64_try_cmpxchg_relaxed(v, old, new);
+	__atomic_post_full_fence();
+	return ret;
+}
+#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg
+#endif
+
+#endif /* arch_atomic64_try_cmpxchg_relaxed */
+
+#ifndef arch_atomic64_sub_and_test
+/**
+ * arch_atomic64_sub_and_test - subtract value from variable and test result
+ * @i: integer value to subtract
+ * @v: pointer of type atomic64_t
+ *
+ * Atomically subtracts @i from @v and returns
+ * true if the result is zero, or false for all
+ * other cases.
+ */
+static __always_inline bool
+arch_atomic64_sub_and_test(s64 i, atomic64_t *v)
+{
+	return arch_atomic64_sub_return(i, v) == 0;
+}
+#define arch_atomic64_sub_and_test arch_atomic64_sub_and_test
+#endif
+
+#ifndef arch_atomic64_dec_and_test
+/**
+ * arch_atomic64_dec_and_test - decrement and test
+ * @v: pointer of type atomic64_t
+ *
+ * Atomically decrements @v by 1 and
+ * returns true if the result is 0, or false for all other
+ * cases.
+ */
+static __always_inline bool
+arch_atomic64_dec_and_test(atomic64_t *v)
+{
+	return arch_atomic64_dec_return(v) == 0;
+}
+#define arch_atomic64_dec_and_test arch_atomic64_dec_and_test
+#endif
+
+#ifndef arch_atomic64_inc_and_test
+/**
+ * arch_atomic64_inc_and_test - increment and test
+ * @v: pointer of type atomic64_t
+ *
+ * Atomically increments @v by 1
+ * and returns true if the result is zero, or false for all
+ * other cases.
+ */
+static __always_inline bool
+arch_atomic64_inc_and_test(atomic64_t *v)
+{
+	return arch_atomic64_inc_return(v) == 0;
+}
+#define arch_atomic64_inc_and_test arch_atomic64_inc_and_test
+#endif
+
+#ifndef arch_atomic64_add_negative
+/**
+ * arch_atomic64_add_negative - add and test if negative
+ * @i: integer value to add
+ * @v: pointer of type atomic64_t
+ *
+ * Atomically adds @i to @v and returns true
+ * if the result is negative, or false when
+ * result is greater than or equal to zero.
+ */
+static __always_inline bool
+arch_atomic64_add_negative(s64 i, atomic64_t *v)
+{
+	return arch_atomic64_add_return(i, v) < 0;
+}
+#define arch_atomic64_add_negative arch_atomic64_add_negative
+#endif
+
+#ifndef arch_atomic64_fetch_add_unless
+/**
+ * arch_atomic64_fetch_add_unless - add unless the number is already a given value
+ * @v: pointer of type atomic64_t
+ * @a: the amount to add to v...
+ * @u: ...unless v is equal to u.
+ *
+ * Atomically adds @a to @v, so long as @v was not already @u.
+ * Returns original value of @v
+ */
+static __always_inline s64
+arch_atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
+{
+	s64 c = arch_atomic64_read(v);
+
+	do {
+		if (unlikely(c == u))
+			break;
+	} while (!arch_atomic64_try_cmpxchg(v, &c, c + a));
+
+	return c;
+}
+#define arch_atomic64_fetch_add_unless arch_atomic64_fetch_add_unless
+#endif
+
+#ifndef arch_atomic64_add_unless
+/**
+ * arch_atomic64_add_unless - add unless the number is already a given value
+ * @v: pointer of type atomic64_t
+ * @a: the amount to add to v...
+ * @u: ...unless v is equal to u.
+ *
+ * Atomically adds @a to @v, if @v was not already @u.
+ * Returns true if the addition was done.
+ */
+static __always_inline bool
+arch_atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
+{
+	return arch_atomic64_fetch_add_unless(v, a, u) != u;
+}
+#define arch_atomic64_add_unless arch_atomic64_add_unless
+#endif
+
+#ifndef arch_atomic64_inc_not_zero
+/**
+ * arch_atomic64_inc_not_zero - increment unless the number is zero
+ * @v: pointer of type atomic64_t
+ *
+ * Atomically increments @v by 1, if @v is non-zero.
+ * Returns true if the increment was done.
+ */
+static __always_inline bool
+arch_atomic64_inc_not_zero(atomic64_t *v)
+{
+	return arch_atomic64_add_unless(v, 1, 0);
+}
+#define arch_atomic64_inc_not_zero arch_atomic64_inc_not_zero
+#endif
+
+#ifndef arch_atomic64_inc_unless_negative
+static __always_inline bool
+arch_atomic64_inc_unless_negative(atomic64_t *v)
+{
+	s64 c = arch_atomic64_read(v);
+
+	do {
+		if (unlikely(c < 0))
+			return false;
+	} while (!arch_atomic64_try_cmpxchg(v, &c, c + 1));
+
+	return true;
+}
+#define arch_atomic64_inc_unless_negative arch_atomic64_inc_unless_negative
+#endif
+
+#ifndef arch_atomic64_dec_unless_positive
+static __always_inline bool
+arch_atomic64_dec_unless_positive(atomic64_t *v)
+{
+	s64 c = arch_atomic64_read(v);
+
+	do {
+		if (unlikely(c > 0))
+			return false;
+	} while (!arch_atomic64_try_cmpxchg(v, &c, c - 1));
+
+	return true;
+}
+#define arch_atomic64_dec_unless_positive arch_atomic64_dec_unless_positive
+#endif
+
+#ifndef arch_atomic64_dec_if_positive
+static __always_inline s64
+arch_atomic64_dec_if_positive(atomic64_t *v)
+{
+	s64 dec, c = arch_atomic64_read(v);
+
+	do {
+		dec = c - 1;
+		if (unlikely(dec < 0))
+			break;
+	} while (!arch_atomic64_try_cmpxchg(v, &c, dec));
+
+	return dec;
+}
+#define arch_atomic64_dec_if_positive arch_atomic64_dec_if_positive
+#endif
+
+#endif /* _LINUX_ATOMIC_FALLBACK_H */
+// 90cd26cfd69d2250303d654955a0cc12620fb91b
--- a/include/linux/atomic-fallback.h
+++ b/include/linux/atomic-fallback.h
@@ -1180,9 +1180,6 @@ atomic_dec_if_positive(atomic_t *v)
 #define atomic_dec_if_positive atomic_dec_if_positive
 #endif
 
-#define atomic_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c))
-#define atomic_cond_read_relaxed(v, c) smp_cond_load_relaxed(&(v)->counter, (c))
-
 #ifdef CONFIG_GENERIC_ATOMIC64
 #include <asm-generic/atomic64.h>
 #endif
@@ -2290,8 +2287,5 @@ atomic64_dec_if_positive(atomic64_t *v)
 #define atomic64_dec_if_positive atomic64_dec_if_positive
 #endif
 
-#define atomic64_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c))
-#define atomic64_cond_read_relaxed(v, c) smp_cond_load_relaxed(&(v)->counter, (c))
-
 #endif /* _LINUX_ATOMIC_FALLBACK_H */
-// baaf45f4c24ed88ceae58baca39d7fd80bb8101b
+// 1fac0941c79bf0ae100723cc2ac9b94061f0b67a
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -25,6 +25,12 @@
  * See Documentation/memory-barriers.txt for ACQUIRE/RELEASE definitions.
  */
 
+#define atomic_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c))
+#define atomic_cond_read_relaxed(v, c) smp_cond_load_relaxed(&(v)->counter, (c))
+
+#define atomic64_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c))
+#define atomic64_cond_read_relaxed(v, c) smp_cond_load_relaxed(&(v)->counter, (c))
+
 /*
  * The idea here is to build acquire/release variants by adding explicit
  * barriers on top of the relaxed variant. In the case where the relaxed
@@ -71,7 +77,12 @@
 	__ret;								\
 })
 
+#ifdef ARCH_ATOMIC
+#include <linux/atomic-arch-fallback.h>
+#include <asm-generic/atomic-instrumented.h>
+#else
 #include <linux/atomic-fallback.h>
+#endif
 
 #include <asm-generic/atomic-long.h>
 
--- a/scripts/atomic/fallbacks/acquire
+++ b/scripts/atomic/fallbacks/acquire
@@ -1,8 +1,8 @@
 cat <<EOF
 static __always_inline ${ret}
-${atomic}_${pfx}${name}${sfx}_acquire(${params})
+${arch}${atomic}_${pfx}${name}${sfx}_acquire(${params})
 {
-	${ret} ret = ${atomic}_${pfx}${name}${sfx}_relaxed(${args});
+	${ret} ret = ${arch}${atomic}_${pfx}${name}${sfx}_relaxed(${args});
 	__atomic_acquire_fence();
 	return ret;
 }
--- a/scripts/atomic/fallbacks/add_negative
+++ b/scripts/atomic/fallbacks/add_negative
@@ -1,6 +1,6 @@
 cat <<EOF
 /**
- * ${atomic}_add_negative - add and test if negative
+ * ${arch}${atomic}_add_negative - add and test if negative
  * @i: integer value to add
  * @v: pointer of type ${atomic}_t
  *
@@ -9,8 +9,8 @@ cat <<EOF
  * result is greater than or equal to zero.
  */
 static __always_inline bool
-${atomic}_add_negative(${int} i, ${atomic}_t *v)
+${arch}${atomic}_add_negative(${int} i, ${atomic}_t *v)
 {
-	return ${atomic}_add_return(i, v) < 0;
+	return ${arch}${atomic}_add_return(i, v) < 0;
 }
 EOF
--- a/scripts/atomic/fallbacks/add_unless
+++ b/scripts/atomic/fallbacks/add_unless
@@ -1,6 +1,6 @@
 cat << EOF
 /**
- * ${atomic}_add_unless - add unless the number is already a given value
+ * ${arch}${atomic}_add_unless - add unless the number is already a given value
  * @v: pointer of type ${atomic}_t
  * @a: the amount to add to v...
  * @u: ...unless v is equal to u.
@@ -9,8 +9,8 @@ cat << EOF
  * Returns true if the addition was done.
  */
 static __always_inline bool
-${atomic}_add_unless(${atomic}_t *v, ${int} a, ${int} u)
+${arch}${atomic}_add_unless(${atomic}_t *v, ${int} a, ${int} u)
 {
-	return ${atomic}_fetch_add_unless(v, a, u) != u;
+	return ${arch}${atomic}_fetch_add_unless(v, a, u) != u;
 }
 EOF
--- a/scripts/atomic/fallbacks/andnot
+++ b/scripts/atomic/fallbacks/andnot
@@ -1,7 +1,7 @@
 cat <<EOF
 static __always_inline ${ret}
-${atomic}_${pfx}andnot${sfx}${order}(${int} i, ${atomic}_t *v)
+${arch}${atomic}_${pfx}andnot${sfx}${order}(${int} i, ${atomic}_t *v)
 {
-	${retstmt}${atomic}_${pfx}and${sfx}${order}(~i, v);
+	${retstmt}${arch}${atomic}_${pfx}and${sfx}${order}(~i, v);
 }
 EOF
--- a/scripts/atomic/fallbacks/dec
+++ b/scripts/atomic/fallbacks/dec
@@ -1,7 +1,7 @@
 cat <<EOF
 static __always_inline ${ret}
-${atomic}_${pfx}dec${sfx}${order}(${atomic}_t *v)
+${arch}${atomic}_${pfx}dec${sfx}${order}(${atomic}_t *v)
 {
-	${retstmt}${atomic}_${pfx}sub${sfx}${order}(1, v);
+	${retstmt}${arch}${atomic}_${pfx}sub${sfx}${order}(1, v);
 }
 EOF
--- a/scripts/atomic/fallbacks/dec_and_test
+++ b/scripts/atomic/fallbacks/dec_and_test
@@ -1,6 +1,6 @@
 cat <<EOF
 /**
- * ${atomic}_dec_and_test - decrement and test
+ * ${arch}${atomic}_dec_and_test - decrement and test
  * @v: pointer of type ${atomic}_t
  *
  * Atomically decrements @v by 1 and
@@ -8,8 +8,8 @@ cat <<EOF
  * cases.
  */
 static __always_inline bool
-${atomic}_dec_and_test(${atomic}_t *v)
+${arch}${atomic}_dec_and_test(${atomic}_t *v)
 {
-	return ${atomic}_dec_return(v) == 0;
+	return ${arch}${atomic}_dec_return(v) == 0;
 }
 EOF
--- a/scripts/atomic/fallbacks/dec_if_positive
+++ b/scripts/atomic/fallbacks/dec_if_positive
@@ -1,14 +1,14 @@
 cat <<EOF
 static __always_inline ${ret}
-${atomic}_dec_if_positive(${atomic}_t *v)
+${arch}${atomic}_dec_if_positive(${atomic}_t *v)
 {
-	${int} dec, c = ${atomic}_read(v);
+	${int} dec, c = ${arch}${atomic}_read(v);
 
 	do {
 		dec = c - 1;
 		if (unlikely(dec < 0))
 			break;
-	} while (!${atomic}_try_cmpxchg(v, &c, dec));
+	} while (!${arch}${atomic}_try_cmpxchg(v, &c, dec));
 
 	return dec;
 }
--- a/scripts/atomic/fallbacks/dec_unless_positive
+++ b/scripts/atomic/fallbacks/dec_unless_positive
@@ -1,13 +1,13 @@
 cat <<EOF
 static __always_inline bool
-${atomic}_dec_unless_positive(${atomic}_t *v)
+${arch}${atomic}_dec_unless_positive(${atomic}_t *v)
 {
-	${int} c = ${atomic}_read(v);
+	${int} c = ${arch}${atomic}_read(v);
 
 	do {
 		if (unlikely(c > 0))
 			return false;
-	} while (!${atomic}_try_cmpxchg(v, &c, c - 1));
+	} while (!${arch}${atomic}_try_cmpxchg(v, &c, c - 1));
 
 	return true;
 }
--- a/scripts/atomic/fallbacks/fence
+++ b/scripts/atomic/fallbacks/fence
@@ -1,10 +1,10 @@
 cat <<EOF
 static __always_inline ${ret}
-${atomic}_${pfx}${name}${sfx}(${params})
+${arch}${atomic}_${pfx}${name}${sfx}(${params})
 {
 	${ret} ret;
 	__atomic_pre_full_fence();
-	ret = ${atomic}_${pfx}${name}${sfx}_relaxed(${args});
+	ret = ${arch}${atomic}_${pfx}${name}${sfx}_relaxed(${args});
 	__atomic_post_full_fence();
 	return ret;
 }
--- a/scripts/atomic/fallbacks/fetch_add_unless
+++ b/scripts/atomic/fallbacks/fetch_add_unless
@@ -1,6 +1,6 @@
 cat << EOF
 /**
- * ${atomic}_fetch_add_unless - add unless the number is already a given value
+ * ${arch}${atomic}_fetch_add_unless - add unless the number is already a given value
  * @v: pointer of type ${atomic}_t
  * @a: the amount to add to v...
  * @u: ...unless v is equal to u.
@@ -9,14 +9,14 @@ cat << EOF
  * Returns original value of @v
  */
 static __always_inline ${int}
-${atomic}_fetch_add_unless(${atomic}_t *v, ${int} a, ${int} u)
+${arch}${atomic}_fetch_add_unless(${atomic}_t *v, ${int} a, ${int} u)
 {
-	${int} c = ${atomic}_read(v);
+	${int} c = ${arch}${atomic}_read(v);
 
 	do {
 		if (unlikely(c == u))
 			break;
-	} while (!${atomic}_try_cmpxchg(v, &c, c + a));
+	} while (!${arch}${atomic}_try_cmpxchg(v, &c, c + a));
 
 	return c;
 }
--- a/scripts/atomic/fallbacks/inc
+++ b/scripts/atomic/fallbacks/inc
@@ -1,7 +1,7 @@
 cat <<EOF
 static __always_inline ${ret}
-${atomic}_${pfx}inc${sfx}${order}(${atomic}_t *v)
+${arch}${atomic}_${pfx}inc${sfx}${order}(${atomic}_t *v)
 {
-	${retstmt}${atomic}_${pfx}add${sfx}${order}(1, v);
+	${retstmt}${arch}${atomic}_${pfx}add${sfx}${order}(1, v);
 }
 EOF
--- a/scripts/atomic/fallbacks/inc_and_test
+++ b/scripts/atomic/fallbacks/inc_and_test
@@ -1,6 +1,6 @@
 cat <<EOF
 /**
- * ${atomic}_inc_and_test - increment and test
+ * ${arch}${atomic}_inc_and_test - increment and test
  * @v: pointer of type ${atomic}_t
  *
  * Atomically increments @v by 1
@@ -8,8 +8,8 @@ cat <<EOF
  * other cases.
  */
 static __always_inline bool
-${atomic}_inc_and_test(${atomic}_t *v)
+${arch}${atomic}_inc_and_test(${atomic}_t *v)
 {
-	return ${atomic}_inc_return(v) == 0;
+	return ${arch}${atomic}_inc_return(v) == 0;
 }
 EOF
--- a/scripts/atomic/fallbacks/inc_not_zero
+++ b/scripts/atomic/fallbacks/inc_not_zero
@@ -1,14 +1,14 @@
 cat <<EOF
 /**
- * ${atomic}_inc_not_zero - increment unless the number is zero
+ * ${arch}${atomic}_inc_not_zero - increment unless the number is zero
  * @v: pointer of type ${atomic}_t
  *
  * Atomically increments @v by 1, if @v is non-zero.
  * Returns true if the increment was done.
  */
 static __always_inline bool
-${atomic}_inc_not_zero(${atomic}_t *v)
+${arch}${atomic}_inc_not_zero(${atomic}_t *v)
 {
-	return ${atomic}_add_unless(v, 1, 0);
+	return ${arch}${atomic}_add_unless(v, 1, 0);
 }
 EOF
--- a/scripts/atomic/fallbacks/inc_unless_negative
+++ b/scripts/atomic/fallbacks/inc_unless_negative
@@ -1,13 +1,13 @@
 cat <<EOF
 static __always_inline bool
-${atomic}_inc_unless_negative(${atomic}_t *v)
+${arch}${atomic}_inc_unless_negative(${atomic}_t *v)
 {
-	${int} c = ${atomic}_read(v);
+	${int} c = ${arch}${atomic}_read(v);
 
 	do {
 		if (unlikely(c < 0))
 			return false;
-	} while (!${atomic}_try_cmpxchg(v, &c, c + 1));
+	} while (!${arch}${atomic}_try_cmpxchg(v, &c, c + 1));
 
 	return true;
 }
--- a/scripts/atomic/fallbacks/read_acquire
+++ b/scripts/atomic/fallbacks/read_acquire
@@ -1,6 +1,6 @@
 cat <<EOF
 static __always_inline ${ret}
-${atomic}_read_acquire(const ${atomic}_t *v)
+${arch}${atomic}_read_acquire(const ${atomic}_t *v)
 {
 	return smp_load_acquire(&(v)->counter);
 }
--- a/scripts/atomic/fallbacks/release
+++ b/scripts/atomic/fallbacks/release
@@ -1,8 +1,8 @@
 cat <<EOF
 static __always_inline ${ret}
-${atomic}_${pfx}${name}${sfx}_release(${params})
+${arch}${atomic}_${pfx}${name}${sfx}_release(${params})
 {
 	__atomic_release_fence();
-	${retstmt}${atomic}_${pfx}${name}${sfx}_relaxed(${args});
+	${retstmt}${arch}${atomic}_${pfx}${name}${sfx}_relaxed(${args});
 }
 EOF
--- a/scripts/atomic/fallbacks/set_release
+++ b/scripts/atomic/fallbacks/set_release
@@ -1,6 +1,6 @@
 cat <<EOF
 static __always_inline void
-${atomic}_set_release(${atomic}_t *v, ${int} i)
+${arch}${atomic}_set_release(${atomic}_t *v, ${int} i)
 {
 	smp_store_release(&(v)->counter, i);
 }
--- a/scripts/atomic/fallbacks/sub_and_test
+++ b/scripts/atomic/fallbacks/sub_and_test
@@ -1,6 +1,6 @@
 cat <<EOF
 /**
- * ${atomic}_sub_and_test - subtract value from variable and test result
+ * ${arch}${atomic}_sub_and_test - subtract value from variable and test result
  * @i: integer value to subtract
  * @v: pointer of type ${atomic}_t
  *
@@ -9,8 +9,8 @@ cat <<EOF
  * other cases.
  */
 static __always_inline bool
-${atomic}_sub_and_test(${int} i, ${atomic}_t *v)
+${arch}${atomic}_sub_and_test(${int} i, ${atomic}_t *v)
 {
-	return ${atomic}_sub_return(i, v) == 0;
+	return ${arch}${atomic}_sub_return(i, v) == 0;
 }
 EOF
--- a/scripts/atomic/fallbacks/try_cmpxchg
+++ b/scripts/atomic/fallbacks/try_cmpxchg
@@ -1,9 +1,9 @@
 cat <<EOF
 static __always_inline bool
-${atomic}_try_cmpxchg${order}(${atomic}_t *v, ${int} *old, ${int} new)
+${arch}${atomic}_try_cmpxchg${order}(${atomic}_t *v, ${int} *old, ${int} new)
 {
 	${int} r, o = *old;
-	r = ${atomic}_cmpxchg${order}(v, o, new);
+	r = ${arch}${atomic}_cmpxchg${order}(v, o, new);
 	if (unlikely(r != o))
 		*old = r;
 	return likely(r == o);
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -2,10 +2,11 @@
 # SPDX-License-Identifier: GPL-2.0
 
 ATOMICDIR=$(dirname $0)
+ARCH=$2
 
 . ${ATOMICDIR}/atomic-tbl.sh
 
-#gen_template_fallback(template, meta, pfx, name, sfx, order, atomic, int, args...)
+#gen_template_fallback(template, meta, pfx, name, sfx, order, arch, atomic, int, args...)
 gen_template_fallback()
 {
 	local template="$1"; shift
@@ -14,10 +15,11 @@ gen_template_fallback()
 	local name="$1"; shift
 	local sfx="$1"; shift
 	local order="$1"; shift
+	local arch="$1"; shift
 	local atomic="$1"; shift
 	local int="$1"; shift
 
-	local atomicname="${atomic}_${pfx}${name}${sfx}${order}"
+	local atomicname="${arch}${atomic}_${pfx}${name}${sfx}${order}"
 
 	local ret="$(gen_ret_type "${meta}" "${int}")"
 	local retstmt="$(gen_ret_stmt "${meta}")"
@@ -32,7 +34,7 @@ gen_template_fallback()
 	fi
 }
 
-#gen_proto_fallback(meta, pfx, name, sfx, order, atomic, int, args...)
+#gen_proto_fallback(meta, pfx, name, sfx, order, arch, atomic, int, args...)
 gen_proto_fallback()
 {
 	local meta="$1"; shift
@@ -56,16 +58,17 @@ cat << EOF
 EOF
 }
 
-#gen_proto_order_variants(meta, pfx, name, sfx, atomic, int, args...)
+#gen_proto_order_variants(meta, pfx, name, sfx, arch, atomic, int, args...)
 gen_proto_order_variants()
 {
 	local meta="$1"; shift
 	local pfx="$1"; shift
 	local name="$1"; shift
 	local sfx="$1"; shift
-	local atomic="$1"
+	local arch="$1"
+	local atomic="$2"
 
-	local basename="${atomic}_${pfx}${name}${sfx}"
+	local basename="${arch}${atomic}_${pfx}${name}${sfx}"
 
 	local template="$(find_fallback_template "${pfx}" "${name}" "${sfx}" "${order}")"
 
@@ -94,7 +97,7 @@ gen_proto_order_variants()
 	gen_basic_fallbacks "${basename}"
 
 	if [ ! -z "${template}" ]; then
-		printf "#endif /* ${atomic}_${pfx}${name}${sfx} */\n\n"
+		printf "#endif /* ${arch}${atomic}_${pfx}${name}${sfx} */\n\n"
 		gen_proto_fallback "${meta}" "${pfx}" "${name}" "${sfx}" "" "$@"
 		gen_proto_fallback "${meta}" "${pfx}" "${name}" "${sfx}" "_acquire" "$@"
 		gen_proto_fallback "${meta}" "${pfx}" "${name}" "${sfx}" "_release" "$@"
@@ -153,18 +156,15 @@ cat << EOF
 
 EOF
 
-for xchg in "xchg" "cmpxchg" "cmpxchg64"; do
+for xchg in "${ARCH}xchg" "${ARCH}cmpxchg" "${ARCH}cmpxchg64"; do
 	gen_xchg_fallbacks "${xchg}"
 done
 
 grep '^[a-z]' "$1" | while read name meta args; do
-	gen_proto "${meta}" "${name}" "atomic" "int" ${args}
+	gen_proto "${meta}" "${name}" "${ARCH}" "atomic" "int" ${args}
 done
 
 cat <<EOF
-#define atomic_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c))
-#define atomic_cond_read_relaxed(v, c) smp_cond_load_relaxed(&(v)->counter, (c))
-
 #ifdef CONFIG_GENERIC_ATOMIC64
 #include <asm-generic/atomic64.h>
 #endif
@@ -172,12 +172,9 @@ cat <<EOF
 EOF
 
 grep '^[a-z]' "$1" | while read name meta args; do
-	gen_proto "${meta}" "${name}" "atomic64" "s64" ${args}
+	gen_proto "${meta}" "${name}" "${ARCH}" "atomic64" "s64" ${args}
 done
 
 cat <<EOF
-#define atomic64_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c))
-#define atomic64_cond_read_relaxed(v, c) smp_cond_load_relaxed(&(v)->counter, (c))
-
 #endif /* _LINUX_ATOMIC_FALLBACK_H */
 EOF
--- a/scripts/atomic/gen-atomics.sh
+++ b/scripts/atomic/gen-atomics.sh
@@ -10,10 +10,11 @@ LINUXDIR=${ATOMICDIR}/../..
 cat <<EOF |
 gen-atomic-instrumented.sh      asm-generic/atomic-instrumented.h
 gen-atomic-long.sh              asm-generic/atomic-long.h
+gen-atomic-fallback.sh          linux/atomic-arch-fallback.h		arch_
 gen-atomic-fallback.sh          linux/atomic-fallback.h
 EOF
-while read script header; do
-	/bin/sh ${ATOMICDIR}/${script} ${ATOMICTBL} > ${LINUXDIR}/include/${header}
+while read script header args; do
+	/bin/sh ${ATOMICDIR}/${script} ${ATOMICTBL} ${args} > ${LINUXDIR}/include/${header}
 	HASH="$(sha1sum ${LINUXDIR}/include/${header})"
 	HASH="${HASH%% *}"
 	printf "// %s\n" "${HASH}" >> ${LINUXDIR}/include/${header}



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 21/22] x86/int3: Avoid atomic instrumentation
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (19 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 20/22] locking/atomics: Flip fallbacks and instrumentation Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 14:47 ` [PATCH v3 22/22] x86/int3: Ensure that poke_int3_handler() is not sanitized Peter Zijlstra
  21 siblings, 0 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat

Use arch_atomic_*() and READ_ONCE_NOCHECK() to ensure nothing untoward
creeps in and ruins things.

That is; this is the INT3 text poke handler, strictly limit the code
that runs in it, lest we inadvertenly hit yet another INT3.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -960,9 +960,9 @@ static struct bp_patching_desc *bp_desc;
 static __always_inline
 struct bp_patching_desc *try_get_desc(struct bp_patching_desc **descp)
 {
-	struct bp_patching_desc *desc = READ_ONCE(*descp); /* rcu_dereference */
+	struct bp_patching_desc *desc = READ_ONCE_NOCHECK(*descp); /* rcu_dereference */
 
-	if (!desc || !atomic_inc_not_zero(&desc->refs))
+	if (!desc || !arch_atomic_inc_not_zero(&desc->refs))
 		return NULL;
 
 	return desc;
@@ -971,7 +971,7 @@ struct bp_patching_desc *try_get_desc(st
 static __always_inline void put_desc(struct bp_patching_desc *desc)
 {
 	smp_mb__before_atomic();
-	atomic_dec(&desc->refs);
+	arch_atomic_dec(&desc->refs);
 }
 
 static __always_inline void *text_poke_addr(struct text_poke_loc *tp)



^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v3 22/22] x86/int3: Ensure that poke_int3_handler() is not sanitized
  2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
                   ` (20 preceding siblings ...)
  2020-02-19 14:47 ` [PATCH v3 21/22] x86/int3: Avoid atomic instrumentation Peter Zijlstra
@ 2020-02-19 14:47 ` Peter Zijlstra
  2020-02-19 16:06   ` Dmitry Vyukov
  21 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 14:47 UTC (permalink / raw)
  To: linux-kernel, linux-arch, rostedt
  Cc: peterz, mingo, joel, gregkh, gustavo, tglx, paulmck, josh,
	mathieu.desnoyers, jiangshanlai, luto, tony.luck, frederic,
	dan.carpenter, mhiramat, Dmitry Vyukov, Andrey Ryabinin

In order to ensure poke_int3_handler() is completely self contained --
we call this while we're modifying other text, imagine the fun of
hitting another INT3 -- ensure that everything is without sanitize
crud.

Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c       |    4 ++--
 arch/x86/kernel/traps.c             |    2 +-
 include/linux/compiler-clang.h      |    7 +++++++
 include/linux/compiler-gcc.h        |    6 ++++++
 include/linux/compiler.h            |    5 +++++
 include/linux/compiler_attributes.h |    1 +
 lib/bsearch.c                       |    2 +-
 7 files changed, 23 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -979,7 +979,7 @@ static __always_inline void *text_poke_a
 	return _stext + tp->rel_addr;
 }
 
-static int notrace patch_cmp(const void *key, const void *elt)
+static int notrace __no_sanitize patch_cmp(const void *key, const void *elt)
 {
 	struct text_poke_loc *tp = (struct text_poke_loc *) elt;
 
@@ -991,7 +991,7 @@ static int notrace patch_cmp(const void
 }
 NOKPROBE_SYMBOL(patch_cmp);
 
-int notrace poke_int3_handler(struct pt_regs *regs)
+int notrace __no_sanitize poke_int3_handler(struct pt_regs *regs)
 {
 	struct bp_patching_desc *desc;
 	struct text_poke_loc *tp;
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -496,7 +496,7 @@ dotraplinkage void do_general_protection
 }
 NOKPROBE_SYMBOL(do_general_protection);
 
-dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
+dotraplinkage void notrace __no_sanitize do_int3(struct pt_regs *regs, long error_code)
 {
 	if (poke_int3_handler(regs))
 		return;
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -24,6 +24,13 @@
 #define __no_sanitize_address
 #endif
 
+#if __has_feature(undefined_sanitizer)
+#define __no_sanitize_undefined \
+		__atribute__((no_sanitize("undefined")))
+#else
+#define __no_sanitize_undefined
+#endif
+
 /*
  * Not all versions of clang implement the the type-generic versions
  * of the builtin overflow checkers. Fortunately, clang implements
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -145,6 +145,12 @@
 #define __no_sanitize_address
 #endif
 
+#if __has_attribute(__no_sanitize_undefined__)
+#define __no_sanitize_undefined __attribute__((no_sanitize_undefined))
+#else
+#define __no_sanitize_undefined
+#endif
+
 #if GCC_VERSION >= 50100
 #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
 #endif
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -199,6 +199,7 @@ void __read_once_size(const volatile voi
 	__READ_ONCE_SIZE;
 }
 
+#define __no_kasan __no_sanitize_address
 #ifdef CONFIG_KASAN
 /*
  * We can't declare function 'inline' because __no_sanitize_address confilcts
@@ -274,6 +275,10 @@ static __always_inline void __write_once
  */
 #define READ_ONCE_NOCHECK(x) __READ_ONCE(x, 0)
 
+#define __no_ubsan __no_sanitize_undefined
+
+#define __no_sanitize __no_kasan __no_ubsan
+
 static __no_kasan_or_inline
 unsigned long read_word_at_a_time(const void *addr)
 {
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -41,6 +41,7 @@
 # define __GCC4_has_attribute___nonstring__           0
 # define __GCC4_has_attribute___no_sanitize_address__ (__GNUC_MINOR__ >= 8)
 # define __GCC4_has_attribute___fallthrough__         0
+# define __GCC4_has_attribute___no_sanitize_undefined__ (__GNUC_MINOR__ >= 9)
 #endif
 
 /*
--- a/lib/bsearch.c
+++ b/lib/bsearch.c
@@ -28,7 +28,7 @@
  * the key and elements in the array are of the same type, you can use
  * the same comparison function for both sort() and bsearch().
  */
-void *bsearch(const void *key, const void *base, size_t num, size_t size,
+void __no_sanitize *bsearch(const void *key, const void *base, size_t num, size_t size,
 	      cmp_func_t cmp)
 {
 	const char *pivot;



^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter()
  2020-02-19 14:47 ` [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter() Peter Zijlstra
@ 2020-02-19 15:31   ` Steven Rostedt
  2020-02-19 16:56     ` Borislav Petkov
  2020-02-20  8:41   ` Will Deacon
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 15:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat, Will Deacon, Marc Zyngier,
	Michael Ellerman, Petr Mladek

On Wed, 19 Feb 2020 15:47:25 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> Since there are already a number of sites (ARM64, PowerPC) that
> effectively nest nmi_enter(), lets make the primitive support this
> before adding even more.
> 


>  void SMIException(struct pt_regs *regs)
> --- a/include/linux/hardirq.h
> +++ b/include/linux/hardirq.h
> @@ -71,7 +71,7 @@ extern void irq_exit(void);
>  		printk_nmi_enter();				\
>  		lockdep_off();					\
>  		ftrace_nmi_enter();				\
> -		BUG_ON(in_nmi());				\
> +		BUG_ON(in_nmi() == NMI_MASK);			\
>  		preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
>  		rcu_nmi_enter();				\
>  		trace_hardirq_enter();				\
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -26,13 +26,13 @@
>   *         PREEMPT_MASK:	0x000000ff
>   *         SOFTIRQ_MASK:	0x0000ff00
>   *         HARDIRQ_MASK:	0x000f0000
> - *             NMI_MASK:	0x00100000
> + *             NMI_MASK:	0x00f00000
>   * PREEMPT_NEED_RESCHED:	0x80000000
>   */
>  #define PREEMPT_BITS	8
>  #define SOFTIRQ_BITS	8
>  #define HARDIRQ_BITS	4
> -#define NMI_BITS	1
> +#define NMI_BITS	4
>  

Probably should document somewhere (in a comment above nmi_enter()?)
that we allow nmi_enter() to nest up to 15 times.

-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE
  2020-02-19 14:47 ` [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE Peter Zijlstra
@ 2020-02-19 15:36   ` Steven Rostedt
  2020-02-19 15:40     ` Peter Zijlstra
  2020-02-19 15:47   ` Steven Rostedt
  1 sibling, 1 reply; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 15:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, 19 Feb 2020 15:47:28 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> --- a/arch/x86/lib/memcpy_32.c
> +++ b/arch/x86/lib/memcpy_32.c
> @@ -21,7 +21,7 @@ __visible void *memset(void *s, int c, s
>  }
>  EXPORT_SYMBOL(memset);
>  
> -__visible void *memmove(void *dest, const void *src, size_t n)
> +__visible notrace void *memmove(void *dest, const void *src, size_t n)
>  {
>  	int d0,d1,d2,d3,d4,d5;
>  	char *ret = dest;
> @@ -207,3 +207,8 @@ __visible void *memmove(void *dest, cons
>  
>  }
>  EXPORT_SYMBOL(memmove);

Hmm, for things like this, which is adding notrace because of a single
instance of it (although it is fine to trace in any other instance), it
would be nice to have a gcc helper that could call "memmove+5" which
would skip the tracing portion.

-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE
  2020-02-19 15:36   ` Steven Rostedt
@ 2020-02-19 15:40     ` Peter Zijlstra
  2020-02-19 15:55       ` Steven Rostedt
  2020-02-19 15:57       ` Peter Zijlstra
  0 siblings, 2 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 15:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 10:36:14AM -0500, Steven Rostedt wrote:
> On Wed, 19 Feb 2020 15:47:28 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > --- a/arch/x86/lib/memcpy_32.c
> > +++ b/arch/x86/lib/memcpy_32.c
> > @@ -21,7 +21,7 @@ __visible void *memset(void *s, int c, s
> >  }
> >  EXPORT_SYMBOL(memset);
> >  
> > -__visible void *memmove(void *dest, const void *src, size_t n)
> > +__visible notrace void *memmove(void *dest, const void *src, size_t n)
> >  {
> >  	int d0,d1,d2,d3,d4,d5;
> >  	char *ret = dest;
> > @@ -207,3 +207,8 @@ __visible void *memmove(void *dest, cons
> >  
> >  }
> >  EXPORT_SYMBOL(memmove);
> 
> Hmm, for things like this, which is adding notrace because of a single
> instance of it (although it is fine to trace in any other instance), it
> would be nice to have a gcc helper that could call "memmove+5" which
> would skip the tracing portion.

Or just open-code the memmove() in do_double_fault() I suppose. I don't
think we care about super optimized code there. It's the bloody ESPFIX
trainwreck.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 16/22] locking/atomics, kcsan: Add KCSAN instrumentation
  2020-02-19 14:47 ` [PATCH v3 16/22] locking/atomics, kcsan: Add KCSAN instrumentation Peter Zijlstra
@ 2020-02-19 15:46   ` Steven Rostedt
  2020-02-19 16:03     ` Peter Zijlstra
  0 siblings, 1 reply; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 15:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat, Marco Elver, Mark Rutland

On Wed, 19 Feb 2020 15:47:40 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> From: Marco Elver <elver@google.com>
> 
> This adds KCSAN instrumentation to atomic-instrumented.h.
> 
> Signed-off-by: Marco Elver <elver@google.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> [peterz: removed the actual kcsan hooks]
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> ---
>  include/asm-generic/atomic-instrumented.h |  390 +++++++++++++++---------------
>  scripts/atomic/gen-atomic-instrumented.sh |   14 -
>  2 files changed, 212 insertions(+), 192 deletions(-)
> 


Does this and the rest of the series depend on the previous patches in
the series? Or can this be a series on to itself (patches 16-22)?

-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE
  2020-02-19 14:47 ` [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE Peter Zijlstra
  2020-02-19 15:36   ` Steven Rostedt
@ 2020-02-19 15:47   ` Steven Rostedt
  1 sibling, 0 replies; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 15:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, 19 Feb 2020 15:47:28 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> Hitting the tracer or a kprobes from #DF is 'interesting', lets avoid
> that.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 08/22] rcu,tracing: Create trace_rcu_{enter,exit}()
  2020-02-19 14:47 ` [PATCH v3 08/22] rcu,tracing: Create trace_rcu_{enter,exit}() Peter Zijlstra
@ 2020-02-19 15:49   ` Steven Rostedt
  2020-02-19 15:58     ` Peter Zijlstra
  0 siblings, 1 reply; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 15:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, 19 Feb 2020 15:47:32 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> To facilitate tracers that need RCU, add some helpers to wrap the
> magic required.
> 
> The problem is that we can call into tracers (trace events and
> function tracing) while RCU isn't watching and this can happen from
> any context, including NMI.
> 
> It is this latter that is causing most of the trouble; we must make
> sure in_nmi() returns true before we land in anything tracing,
> otherwise we cannot recover.
> 
> These helpers are macros because of header-hell; they're placed here
> because of the proximity to nmi_{enter,exit{().
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/hardirq.h |   32 ++++++++++++++++++++++++++++++++
 
> +/*
> + * Tracing vs RCU
> + * --------------
> + *
> + * tracepoints and function-tracing can happen when RCU isn't watching (idle,
> + * or early IRQ/NMI entry).
> + *
> + * When it happens during idle or early during IRQ entry, tracing will have
> + * to inform RCU that it ought to pay attention, this is done by calling
> + * rcu_irq_enter_irqsave().
> + *
> + * On NMI entry, we must be very careful that tracing only happens after we've
> + * incremented preempt_count(), otherwise we cannot tell we're in NMI and take
> + * the special path.
> + */
> +
> +#define trace_rcu_enter()					\
> +({								\
> +	unsigned long state = 0;				\
> +	if (!rcu_is_watching())	{				\
> +		rcu_irq_enter_irqsave();			\
> +		state = 1;					\
> +	}							\
> +	state;							\
> +})
> +
> +#define trace_rcu_exit(state)					\
> +do {								\
> +	if (state)						\
> +		rcu_irq_exit_irqsave();				\
> +} while (0)
> +

Is there a reason that these can't be static __always_inline functions?

-- Steve

>  #endif /* LINUX_HARDIRQ_H */
> 


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 09/22] sched,rcu,tracing: Avoid tracing before in_nmi() is correct
  2020-02-19 14:47 ` [PATCH v3 09/22] sched,rcu,tracing: Avoid tracing before in_nmi() is correct Peter Zijlstra
@ 2020-02-19 15:50   ` Steven Rostedt
  2020-02-19 15:50   ` Steven Rostedt
  1 sibling, 0 replies; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 15:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat, Steven Rostedt (VMware)

On Wed, 19 Feb 2020 15:47:33 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> If we call into a tracer before in_nmi() becomes true, the tracer can
> no longer detect it is called from NMI context and behave correctly.
> 
> Therefore change nmi_{enter,exit}() to use __preempt_count_{add,sub}()
> as the normal preempt_count_{add,sub}() have a (desired) function
> trace entry.
> 
> This fixes a potential issue with current code; AFAICT when the
> function-tracer has stack-tracing enabled __trace_stack() will
> malfunction when it hits the preempt_count_add() function entry from
> NMI context.
> 
> Suggested-by: Steven Rostedt (VMware) <rosted@goodmis.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 09/22] sched,rcu,tracing: Avoid tracing before in_nmi() is correct
  2020-02-19 14:47 ` [PATCH v3 09/22] sched,rcu,tracing: Avoid tracing before in_nmi() is correct Peter Zijlstra
  2020-02-19 15:50   ` Steven Rostedt
@ 2020-02-19 15:50   ` Steven Rostedt
  1 sibling, 0 replies; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 15:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat, Steven Rostedt (VMware)

On Wed, 19 Feb 2020 15:47:33 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> If we call into a tracer before in_nmi() becomes true, the tracer can
> no longer detect it is called from NMI context and behave correctly.
> 
> Therefore change nmi_{enter,exit}() to use __preempt_count_{add,sub}()
> as the normal preempt_count_{add,sub}() have a (desired) function
> trace entry.
> 
> This fixes a potential issue with current code; AFAICT when the
> function-tracer has stack-tracing enabled __trace_stack() will
> malfunction when it hits the preempt_count_add() function entry from
> NMI context.
> 
> Suggested-by: Steven Rostedt (VMware) <rosted@goodmis.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 10/22] x86,tracing: Add comments to do_nmi()
  2020-02-19 14:47 ` [PATCH v3 10/22] x86,tracing: Add comments to do_nmi() Peter Zijlstra
@ 2020-02-19 15:51   ` Steven Rostedt
  0 siblings, 0 replies; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 15:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, 19 Feb 2020 15:47:34 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> Add a few comments to do_nmi() as a result of the audit.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>


-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 12/22] tracing: Employ trace_rcu_{enter,exit}()
  2020-02-19 14:47 ` [PATCH v3 12/22] tracing: Employ trace_rcu_{enter,exit}() Peter Zijlstra
@ 2020-02-19 15:52   ` Steven Rostedt
  0 siblings, 0 replies; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 15:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, 19 Feb 2020 15:47:36 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> Replace the opencoded (and incomplete) RCU manipulations with the new
> helpers to ensure a regular RCU context when calling into
> __ftrace_trace_stack().
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---


Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again)
  2020-02-19 14:47 ` [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again) Peter Zijlstra
@ 2020-02-19 15:53   ` Steven Rostedt
  2020-02-19 16:43   ` Paul E. McKenney
  1 sibling, 0 replies; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 15:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, 19 Feb 2020 15:47:37 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> Effectively revert commit 865e63b04e9b2 ("tracing: Add back in
> rcu_irq_enter/exit_irqson() for rcuidle tracepoints") now that we've
> taught perf how to deal with not having an RCU context provided.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE
  2020-02-19 15:40     ` Peter Zijlstra
@ 2020-02-19 15:55       ` Steven Rostedt
  2020-02-19 15:57       ` Peter Zijlstra
  1 sibling, 0 replies; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 15:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, 19 Feb 2020 16:40:31 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> > Hmm, for things like this, which is adding notrace because of a single
> > instance of it (although it is fine to trace in any other instance), it
> > would be nice to have a gcc helper that could call "memmove+5" which
> > would skip the tracing portion.  
> 
> Or just open-code the memmove() in do_double_fault() I suppose. I don't
> think we care about super optimized code there. It's the bloody ESPFIX
> trainwreck.

Or just create a memmove_notrace() version and use that.

-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE
  2020-02-19 15:40     ` Peter Zijlstra
  2020-02-19 15:55       ` Steven Rostedt
@ 2020-02-19 15:57       ` Peter Zijlstra
  2020-02-19 16:04         ` Peter Zijlstra
  2020-02-20 12:17         ` Borislav Petkov
  1 sibling, 2 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 15:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 04:40:31PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 19, 2020 at 10:36:14AM -0500, Steven Rostedt wrote:
> > On Wed, 19 Feb 2020 15:47:28 +0100
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > --- a/arch/x86/lib/memcpy_32.c
> > > +++ b/arch/x86/lib/memcpy_32.c
> > > @@ -21,7 +21,7 @@ __visible void *memset(void *s, int c, s
> > >  }
> > >  EXPORT_SYMBOL(memset);
> > >  
> > > -__visible void *memmove(void *dest, const void *src, size_t n)
> > > +__visible notrace void *memmove(void *dest, const void *src, size_t n)
> > >  {
> > >  	int d0,d1,d2,d3,d4,d5;
> > >  	char *ret = dest;
> > > @@ -207,3 +207,8 @@ __visible void *memmove(void *dest, cons
> > >  
> > >  }
> > >  EXPORT_SYMBOL(memmove);
> > 
> > Hmm, for things like this, which is adding notrace because of a single
> > instance of it (although it is fine to trace in any other instance), it
> > would be nice to have a gcc helper that could call "memmove+5" which
> > would skip the tracing portion.
> 
> Or just open-code the memmove() in do_double_fault() I suppose. I don't
> think we care about super optimized code there. It's the bloody ESPFIX
> trainwreck.

Something like so, I suppose...

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 6ef00eb6fbb9..543de932dc7c 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -350,14 +350,20 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
 		regs->ip == (unsigned long)native_irq_return_iret)
 	{
 		struct pt_regs *gpregs = (struct pt_regs *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
+		unsigned long *dst = &gpregs->ip;
+		unsigned long *src = (void *)regs->dp;
+		int i, count = 5;
 
 		/*
 		 * regs->sp points to the failing IRET frame on the
 		 * ESPFIX64 stack.  Copy it to the entry stack.  This fills
 		 * in gpregs->ss through gpregs->ip.
-		 *
 		 */
-		memmove(&gpregs->ip, (void *)regs->sp, 5*8);
+		for (i = 0; i < count; i++) {
+			int idx = (dst <= src) ? i : count - i;
+			dst[idx] = src[idx];
+		}
+
 		gpregs->orig_ax = 0;  /* Missing (lost) #GP error code */
 
 		/*

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 08/22] rcu,tracing: Create trace_rcu_{enter,exit}()
  2020-02-19 15:49   ` Steven Rostedt
@ 2020-02-19 15:58     ` Peter Zijlstra
  2020-02-19 16:15       ` Steven Rostedt
  0 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 15:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 10:49:03AM -0500, Steven Rostedt wrote:
> On Wed, 19 Feb 2020 15:47:32 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:

> > These helpers are macros because of header-hell; they're placed here
> > because of the proximity to nmi_{enter,exit{().

^^^^

> > +#define trace_rcu_enter()					\
> > +({								\
> > +	unsigned long state = 0;				\
> > +	if (!rcu_is_watching())	{				\
> > +		rcu_irq_enter_irqsave();			\
> > +		state = 1;					\
> > +	}							\
> > +	state;							\
> > +})
> > +
> > +#define trace_rcu_exit(state)					\
> > +do {								\
> > +	if (state)						\
> > +		rcu_irq_exit_irqsave();				\
> > +} while (0)
> > +
> 
> Is there a reason that these can't be static __always_inline functions?

It can be done, but then we need fwd declarations of those RCU functions
somewhere outside of rcupdate.h. It's all a bit of a mess.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 16/22] locking/atomics, kcsan: Add KCSAN instrumentation
  2020-02-19 15:46   ` Steven Rostedt
@ 2020-02-19 16:03     ` Peter Zijlstra
  2020-02-19 16:50       ` Paul E. McKenney
  0 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 16:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat, Marco Elver, Mark Rutland

On Wed, Feb 19, 2020 at 10:46:26AM -0500, Steven Rostedt wrote:
> On Wed, 19 Feb 2020 15:47:40 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > From: Marco Elver <elver@google.com>
> > 
> > This adds KCSAN instrumentation to atomic-instrumented.h.
> > 
> > Signed-off-by: Marco Elver <elver@google.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > [peterz: removed the actual kcsan hooks]
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> > ---
> >  include/asm-generic/atomic-instrumented.h |  390 +++++++++++++++---------------
> >  scripts/atomic/gen-atomic-instrumented.sh |   14 -
> >  2 files changed, 212 insertions(+), 192 deletions(-)
> > 
> 
> 
> Does this and the rest of the series depend on the previous patches in
> the series? Or can this be a series on to itself (patches 16-22)?

It can probably stand on its own, but it very much is related in so far
that it's fallout from staring at all this nonsense.

Without these the do_int3() can actually have accidental tracing before
reaching it's nmi_enter().

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE
  2020-02-19 15:57       ` Peter Zijlstra
@ 2020-02-19 16:04         ` Peter Zijlstra
  2020-02-19 16:12           ` Steven Rostedt
  2020-02-20 12:17         ` Borislav Petkov
  1 sibling, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 16:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 04:57:15PM +0100, Peter Zijlstra wrote:
> Something like so, I suppose...
> 
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 6ef00eb6fbb9..543de932dc7c 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -350,14 +350,20 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
>  		regs->ip == (unsigned long)native_irq_return_iret)
>  	{
>  		struct pt_regs *gpregs = (struct pt_regs *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
> +		unsigned long *dst = &gpregs->ip;
> +		unsigned long *src = (void *)regs->dp;
> +		int i, count = 5;
>  
>  		/*
>  		 * regs->sp points to the failing IRET frame on the
>  		 * ESPFIX64 stack.  Copy it to the entry stack.  This fills
>  		 * in gpregs->ss through gpregs->ip.
> -		 *
>  		 */
> -		memmove(&gpregs->ip, (void *)regs->sp, 5*8);
> +		for (i = 0; i < count; i++) {
> +			int idx = (dst <= src) ? i : count - i;

That's an off-by-one for going backward; 'count - 1 - i' should work
better, or I should just stop typing for today ;-)

> +			dst[idx] = src[idx];
> +		}
> +
>  		gpregs->orig_ax = 0;  /* Missing (lost) #GP error code */
>  
>  		/*

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 22/22] x86/int3: Ensure that poke_int3_handler() is not sanitized
  2020-02-19 14:47 ` [PATCH v3 22/22] x86/int3: Ensure that poke_int3_handler() is not sanitized Peter Zijlstra
@ 2020-02-19 16:06   ` Dmitry Vyukov
  2020-02-19 16:30     ` Peter Zijlstra
  0 siblings, 1 reply; 99+ messages in thread
From: Dmitry Vyukov @ 2020-02-19 16:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, linux-arch, Steven Rostedt, Ingo Molnar, Joel Fernandes,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Andy Lutomirski, tony.luck, Frederic Weisbecker,
	Dan Carpenter, Masami Hiramatsu, Andrey Ryabinin, kasan-dev

On Wed, Feb 19, 2020 at 4:14 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> In order to ensure poke_int3_handler() is completely self contained --
> we call this while we're modifying other text, imagine the fun of
> hitting another INT3 -- ensure that everything is without sanitize
> crud.

+kasan-dev

Hi Peter,

How do we hit another INT3 here? Does the code do
out-of-bounds/use-after-free writes?
Debugging later silent memory corruption may be no less fun :)

Not sanitizing bsearch entirely is a bit unfortunate. We won't find
any bugs in it when called from other sites too.
It may deserve a comment at least. Tomorrow I may want to remove
__no_sanitize, just because sanitizing more is better, and no int3
test will fail to stop me from doing that...

It's quite fragile. Tomorrow poke_int3_handler handler calls more of
fewer functions, and both ways it's not detected by anything. And if
we ignore all by one function, it is still not helpful, right?
Depending on failure cause/mode, using kasan_disable/enable_current
may be a better option.


> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Reported-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/alternative.c       |    4 ++--
>  arch/x86/kernel/traps.c             |    2 +-
>  include/linux/compiler-clang.h      |    7 +++++++
>  include/linux/compiler-gcc.h        |    6 ++++++
>  include/linux/compiler.h            |    5 +++++
>  include/linux/compiler_attributes.h |    1 +
>  lib/bsearch.c                       |    2 +-
>  7 files changed, 23 insertions(+), 4 deletions(-)
>
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -979,7 +979,7 @@ static __always_inline void *text_poke_a
>         return _stext + tp->rel_addr;
>  }
>
> -static int notrace patch_cmp(const void *key, const void *elt)
> +static int notrace __no_sanitize patch_cmp(const void *key, const void *elt)
>  {
>         struct text_poke_loc *tp = (struct text_poke_loc *) elt;
>
> @@ -991,7 +991,7 @@ static int notrace patch_cmp(const void
>  }
>  NOKPROBE_SYMBOL(patch_cmp);
>
> -int notrace poke_int3_handler(struct pt_regs *regs)
> +int notrace __no_sanitize poke_int3_handler(struct pt_regs *regs)
>  {
>         struct bp_patching_desc *desc;
>         struct text_poke_loc *tp;
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -496,7 +496,7 @@ dotraplinkage void do_general_protection
>  }
>  NOKPROBE_SYMBOL(do_general_protection);
>
> -dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
> +dotraplinkage void notrace __no_sanitize do_int3(struct pt_regs *regs, long error_code)
>  {
>         if (poke_int3_handler(regs))
>                 return;
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -24,6 +24,13 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_feature(undefined_sanitizer)
> +#define __no_sanitize_undefined \
> +               __atribute__((no_sanitize("undefined")))
> +#else
> +#define __no_sanitize_undefined
> +#endif
> +
>  /*
>   * Not all versions of clang implement the the type-generic versions
>   * of the builtin overflow checkers. Fortunately, clang implements
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -145,6 +145,12 @@
>  #define __no_sanitize_address
>  #endif
>
> +#if __has_attribute(__no_sanitize_undefined__)
> +#define __no_sanitize_undefined __attribute__((no_sanitize_undefined))
> +#else
> +#define __no_sanitize_undefined
> +#endif
> +
>  #if GCC_VERSION >= 50100
>  #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
>  #endif
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -199,6 +199,7 @@ void __read_once_size(const volatile voi
>         __READ_ONCE_SIZE;
>  }
>
> +#define __no_kasan __no_sanitize_address
>  #ifdef CONFIG_KASAN
>  /*
>   * We can't declare function 'inline' because __no_sanitize_address confilcts
> @@ -274,6 +275,10 @@ static __always_inline void __write_once
>   */
>  #define READ_ONCE_NOCHECK(x) __READ_ONCE(x, 0)
>
> +#define __no_ubsan __no_sanitize_undefined
> +
> +#define __no_sanitize __no_kasan __no_ubsan
> +
>  static __no_kasan_or_inline
>  unsigned long read_word_at_a_time(const void *addr)
>  {
> --- a/include/linux/compiler_attributes.h
> +++ b/include/linux/compiler_attributes.h
> @@ -41,6 +41,7 @@
>  # define __GCC4_has_attribute___nonstring__           0
>  # define __GCC4_has_attribute___no_sanitize_address__ (__GNUC_MINOR__ >= 8)
>  # define __GCC4_has_attribute___fallthrough__         0
> +# define __GCC4_has_attribute___no_sanitize_undefined__ (__GNUC_MINOR__ >= 9)
>  #endif
>
>  /*
> --- a/lib/bsearch.c
> +++ b/lib/bsearch.c
> @@ -28,7 +28,7 @@
>   * the key and elements in the array are of the same type, you can use
>   * the same comparison function for both sort() and bsearch().
>   */
> -void *bsearch(const void *key, const void *base, size_t num, size_t size,
> +void __no_sanitize *bsearch(const void *key, const void *base, size_t num, size_t size,
>               cmp_func_t cmp)
>  {
>         const char *pivot;
>
>

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE
  2020-02-19 16:04         ` Peter Zijlstra
@ 2020-02-19 16:12           ` Steven Rostedt
  2020-02-19 16:27             ` Paul E. McKenney
  0 siblings, 1 reply; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 16:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, 19 Feb 2020 17:04:42 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> > -		memmove(&gpregs->ip, (void *)regs->sp, 5*8);
> > +		for (i = 0; i < count; i++) {
> > +			int idx = (dst <= src) ? i : count - i;  
> 
> That's an off-by-one for going backward; 'count - 1 - i' should work
> better, or I should just stop typing for today ;-)

Or, we could just cut and paste the current memmove and make a notrace
version too. Then we don't need to worry bout bugs like this.

-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 08/22] rcu,tracing: Create trace_rcu_{enter,exit}()
  2020-02-19 15:58     ` Peter Zijlstra
@ 2020-02-19 16:15       ` Steven Rostedt
  2020-02-19 16:35         ` Peter Zijlstra
  0 siblings, 1 reply; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 16:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, 19 Feb 2020 16:58:28 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Feb 19, 2020 at 10:49:03AM -0500, Steven Rostedt wrote:
> > On Wed, 19 Feb 2020 15:47:32 +0100
> > Peter Zijlstra <peterz@infradead.org> wrote:  
> 
> > > These helpers are macros because of header-hell; they're placed here
> > > because of the proximity to nmi_{enter,exit{().  
> 
> ^^^^

Bah I can't read, because I even went looking for this!

> 
> > > +#define trace_rcu_enter()					\
> > > +({								\
> > > +	unsigned long state = 0;				\
> > > +	if (!rcu_is_watching())	{				\
> > > +		rcu_irq_enter_irqsave();			\
> > > +		state = 1;					\
> > > +	}							\
> > > +	state;							\
> > > +})
> > > +
> > > +#define trace_rcu_exit(state)					\
> > > +do {								\
> > > +	if (state)						\
> > > +		rcu_irq_exit_irqsave();				\
> > > +} while (0)
> > > +  
> > 
> > Is there a reason that these can't be static __always_inline functions?  
> 
> It can be done, but then we need fwd declarations of those RCU functions
> somewhere outside of rcupdate.h. It's all a bit of a mess.

Maybe this belongs in the rcupdate.h file then?

-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE
  2020-02-19 16:12           ` Steven Rostedt
@ 2020-02-19 16:27             ` Paul E. McKenney
  2020-02-19 16:34               ` Peter Zijlstra
  2020-02-19 17:05               ` Steven Rostedt
  0 siblings, 2 replies; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 16:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 11:12:28AM -0500, Steven Rostedt wrote:
> On Wed, 19 Feb 2020 17:04:42 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > -		memmove(&gpregs->ip, (void *)regs->sp, 5*8);
> > > +		for (i = 0; i < count; i++) {
> > > +			int idx = (dst <= src) ? i : count - i;  
> > 
> > That's an off-by-one for going backward; 'count - 1 - i' should work
> > better, or I should just stop typing for today ;-)
> 
> Or, we could just cut and paste the current memmove and make a notrace
> version too. Then we don't need to worry bout bugs like this.

OK, I will bite...

Can we just make the core be an inline function and make a notrace and
a trace caller?  Possibly going one step further and having one call
the other?  (Presumably the traceable version invoking the notrace
version, but it has been one good long time since I have looked at
function preambles.)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 22/22] x86/int3: Ensure that poke_int3_handler() is not sanitized
  2020-02-19 16:06   ` Dmitry Vyukov
@ 2020-02-19 16:30     ` Peter Zijlstra
  2020-02-19 16:51       ` Peter Zijlstra
  2020-02-19 17:20       ` Peter Zijlstra
  0 siblings, 2 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 16:30 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: LKML, linux-arch, Steven Rostedt, Ingo Molnar, Joel Fernandes,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Andy Lutomirski, tony.luck, Frederic Weisbecker,
	Dan Carpenter, Masami Hiramatsu, Andrey Ryabinin, kasan-dev

On Wed, Feb 19, 2020 at 05:06:03PM +0100, Dmitry Vyukov wrote:
> On Wed, Feb 19, 2020 at 4:14 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > In order to ensure poke_int3_handler() is completely self contained --
> > we call this while we're modifying other text, imagine the fun of
> > hitting another INT3 -- ensure that everything is without sanitize
> > crud.
> 
> +kasan-dev
> 
> Hi Peter,
> 
> How do we hit another INT3 here? 

INT3 is mostly the result of either kprobes (someone sticks a kprobe in
the middle of *SAN) or self modifying text stuff (jump_labels, ftrace
and soon static_call).

> Does the code do
> out-of-bounds/use-after-free writes?
> Debugging later silent memory corruption may be no less fun :)

It all stinks, debugging a recursive exception is also not fun.

> Not sanitizing bsearch entirely is a bit unfortunate. We won't find
> any bugs in it when called from other sites too.

Agreed.

> It may deserve a comment at least. Tomorrow I may want to remove
> __no_sanitize, just because sanitizing more is better, and no int3
> test will fail to stop me from doing that...

If only I actually had a test-case for this :/

> It's quite fragile. Tomorrow poke_int3_handler handler calls more of
> fewer functions, and both ways it's not detected by anything.

Yes; not having tools for this is pretty annoying. In 0/n I asked Dan if
smatch could do at least the normal tracing stuff, the compiler
instrumentation bits are going to be far more difficult because smatch
doesn't work at that level :/

(I actually have

> And if we ignore all by one function, it is still not helpful, right?
> Depending on failure cause/mode, using kasan_disable/enable_current
> may be a better option.

kasan_disable_current() could mostly work; but only covers kasan, not
ubsan or kcsan. It then also relies on kasan_disable_current() itself
being notrace as well as all instrumentation functions itself (which I
think is currently true because of mm/kasan/Makefile stripping
CC_FLAGS_FTRACE).

But what stops someone from sticking a kprobe or #DB before you check
that variable?

By inlining everything in poke_int3_handler() (except bsearch :/) we can
mark the whole function off limits to everything and call it a day. That
simplicity has been the guiding principle so far.

Alternatively we can provide an __always_inline variant of bsearch().

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 05/22] rcu: Make RCU IRQ enter/exit functions rely on in_nmi()
  2020-02-19 14:47 ` [PATCH v3 05/22] rcu: Make RCU IRQ enter/exit functions rely on in_nmi() Peter Zijlstra
@ 2020-02-19 16:31   ` Paul E. McKenney
  2020-02-19 16:37     ` Peter Zijlstra
  2020-02-19 17:16     ` [PATCH] rcu/kprobes: Comment why rcu_nmi_enter() is marked NOKPROBE Steven Rostedt
  0 siblings, 2 replies; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 16:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 03:47:29PM +0100, Peter Zijlstra wrote:
> From: Paul E. McKenney <paulmck@kernel.org>
> 
> The rcu_nmi_enter_common() and rcu_nmi_exit_common() functions take an
> "irq" parameter that indicates whether these functions are invoked from
> an irq handler (irq==true) or an NMI handler (irq==false).  However,
> recent changes have applied notrace to a few critical functions such
> that rcu_nmi_enter_common() and rcu_nmi_exit_common() many now rely
> on in_nmi().  Note that in_nmi() works no differently than before,
> but rather that tracing is now prohibited in code regions where in_nmi()
> would incorrectly report NMI state.
> 
> This commit therefore removes the "irq" parameter and inlines
> rcu_nmi_enter_common() and rcu_nmi_exit_common() into rcu_nmi_enter()
> and rcu_nmi_exit(), respectively.
> 
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Again, thank you.

Would you like to also take the added comment for NOKPROBE_SYMBOL(),
or would you prefer that I carry that separately?  (I dropped it for
now to avoid the conflict with the patch below.)

Here is the latest version of that comment, posted by Steve Rostedt.

							Thanx, Paul

/*
 * All functions called in the breakpoint trap handler (e.g. do_int3()
 * on x86), must not allow kprobes until the kprobe breakpoint handler
 * is called, otherwise it can cause an infinite recursion.
 * On some archs, rcu_nmi_enter() is called in the breakpoint handler
 * before the kprobe breakpoint handler is called, thus it must be
 * marked as NOKPROBE.
 */

> ---
>  kernel/rcu/tree.c |   45 ++++++++++++++-------------------------------
>  1 file changed, 14 insertions(+), 31 deletions(-)
> 
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -614,16 +614,18 @@ void rcu_user_enter(void)
>  }
>  #endif /* CONFIG_NO_HZ_FULL */
>  
> -/*
> +/**
> + * rcu_nmi_exit - inform RCU of exit from NMI context
> + *
>   * If we are returning from the outermost NMI handler that interrupted an
>   * RCU-idle period, update rdp->dynticks and rdp->dynticks_nmi_nesting
>   * to let the RCU grace-period handling know that the CPU is back to
>   * being RCU-idle.
>   *
> - * If you add or remove a call to rcu_nmi_exit_common(), be sure to test
> + * If you add or remove a call to rcu_nmi_exit(), be sure to test
>   * with CONFIG_RCU_EQS_DEBUG=y.
>   */
> -static __always_inline void rcu_nmi_exit_common(bool irq)
> +void rcu_nmi_exit(void)
>  {
>  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>  
> @@ -651,27 +653,16 @@ static __always_inline void rcu_nmi_exit
>  	trace_rcu_dyntick(TPS("Startirq"), rdp->dynticks_nmi_nesting, 0, atomic_read(&rdp->dynticks));
>  	WRITE_ONCE(rdp->dynticks_nmi_nesting, 0); /* Avoid store tearing. */
>  
> -	if (irq)
> +	if (!in_nmi())
>  		rcu_prepare_for_idle();
>  
>  	rcu_dynticks_eqs_enter();
>  
> -	if (irq)
> +	if (!in_nmi())
>  		rcu_dynticks_task_enter();
>  }
>  
>  /**
> - * rcu_nmi_exit - inform RCU of exit from NMI context
> - *
> - * If you add or remove a call to rcu_nmi_exit(), be sure to test
> - * with CONFIG_RCU_EQS_DEBUG=y.
> - */
> -void rcu_nmi_exit(void)
> -{
> -	rcu_nmi_exit_common(false);
> -}
> -
> -/**
>   * rcu_irq_exit - inform RCU that current CPU is exiting irq towards idle
>   *
>   * Exit from an interrupt handler, which might possibly result in entering
> @@ -693,7 +684,7 @@ void rcu_nmi_exit(void)
>  void rcu_irq_exit(void)
>  {
>  	lockdep_assert_irqs_disabled();
> -	rcu_nmi_exit_common(true);
> +	rcu_nmi_exit();
>  }
>  
>  /*
> @@ -777,7 +768,7 @@ void rcu_user_exit(void)
>  #endif /* CONFIG_NO_HZ_FULL */
>  
>  /**
> - * rcu_nmi_enter_common - inform RCU of entry to NMI context
> + * rcu_nmi_enter - inform RCU of entry to NMI context
>   * @irq: Is this call from rcu_irq_enter?
>   *
>   * If the CPU was idle from RCU's viewpoint, update rdp->dynticks and
> @@ -786,10 +777,10 @@ void rcu_user_exit(void)
>   * long as the nesting level does not overflow an int.  (You will probably
>   * run out of stack space first.)
>   *
> - * If you add or remove a call to rcu_nmi_enter_common(), be sure to test
> + * If you add or remove a call to rcu_nmi_enter(), be sure to test
>   * with CONFIG_RCU_EQS_DEBUG=y.
>   */
> -static __always_inline void rcu_nmi_enter_common(bool irq)
> +void rcu_nmi_enter(void)
>  {
>  	long incby = 2;
>  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> @@ -807,12 +798,12 @@ static __always_inline void rcu_nmi_ente
>  	 */
>  	if (rcu_dynticks_curr_cpu_in_eqs()) {
>  
> -		if (irq)
> +		if (!in_nmi())
>  			rcu_dynticks_task_exit();
>  
>  		rcu_dynticks_eqs_exit();
>  
> -		if (irq)
> +		if (!in_nmi())
>  			rcu_cleanup_after_idle();
>  
>  		incby = 1;
> @@ -834,14 +825,6 @@ static __always_inline void rcu_nmi_ente
>  		   rdp->dynticks_nmi_nesting + incby);
>  	barrier();
>  }
> -
> -/**
> - * rcu_nmi_enter - inform RCU of entry to NMI context
> - */
> -void rcu_nmi_enter(void)
> -{
> -	rcu_nmi_enter_common(false);
> -}
>  NOKPROBE_SYMBOL(rcu_nmi_enter);
>  
>  /**
> @@ -869,7 +852,7 @@ NOKPROBE_SYMBOL(rcu_nmi_enter);
>  void rcu_irq_enter(void)
>  {
>  	lockdep_assert_irqs_disabled();
> -	rcu_nmi_enter_common(true);
> +	rcu_nmi_enter();
>  }
>  
>  /*
> 
> 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE
  2020-02-19 16:27             ` Paul E. McKenney
@ 2020-02-19 16:34               ` Peter Zijlstra
  2020-02-19 16:46                 ` Paul E. McKenney
  2020-02-19 17:05               ` Steven Rostedt
  1 sibling, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 16:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 08:27:47AM -0800, Paul E. McKenney wrote:
> On Wed, Feb 19, 2020 at 11:12:28AM -0500, Steven Rostedt wrote:
> > On Wed, 19 Feb 2020 17:04:42 +0100
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > > -		memmove(&gpregs->ip, (void *)regs->sp, 5*8);
> > > > +		for (i = 0; i < count; i++) {
> > > > +			int idx = (dst <= src) ? i : count - i;  
> > > 
> > > That's an off-by-one for going backward; 'count - 1 - i' should work
> > > better, or I should just stop typing for today ;-)
> > 
> > Or, we could just cut and paste the current memmove and make a notrace
> > version too. Then we don't need to worry bout bugs like this.
> 
> OK, I will bite...
> 
> Can we just make the core be an inline function and make a notrace and
> a trace caller?  Possibly going one step further and having one call
> the other?  (Presumably the traceable version invoking the notrace
> version, but it has been one good long time since I have looked at
> function preambles.)

One complication is that GCC (and others) are prone to stick their own
implementation of memmove() (and other string functions) in at 'random'.
That is, it is up to the compiler's discretion wether or not to put a
call to memmove() in or just emit some random giberish they feel has the
same effect.

So if we go play silly games like that, we need be careful (or just call
__memmove I suppose, which is supposed to avoid that IIRC).

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 08/22] rcu,tracing: Create trace_rcu_{enter,exit}()
  2020-02-19 16:15       ` Steven Rostedt
@ 2020-02-19 16:35         ` Peter Zijlstra
  2020-02-19 16:44           ` Paul E. McKenney
  0 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 16:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-arch, mingo, joel, gregkh, gustavo, tglx,
	paulmck, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 11:15:32AM -0500, Steven Rostedt wrote:
> On Wed, 19 Feb 2020 16:58:28 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, Feb 19, 2020 at 10:49:03AM -0500, Steven Rostedt wrote:
> > > On Wed, 19 Feb 2020 15:47:32 +0100
> > > Peter Zijlstra <peterz@infradead.org> wrote:  
> > 
> > > > These helpers are macros because of header-hell; they're placed here
> > > > because of the proximity to nmi_{enter,exit{().  
> > 
> > ^^^^
> 
> Bah I can't read, because I even went looking for this!
> 
> > 
> > > > +#define trace_rcu_enter()					\
> > > > +({								\
> > > > +	unsigned long state = 0;				\
> > > > +	if (!rcu_is_watching())	{				\
> > > > +		rcu_irq_enter_irqsave();			\
> > > > +		state = 1;					\
> > > > +	}							\
> > > > +	state;							\
> > > > +})
> > > > +
> > > > +#define trace_rcu_exit(state)					\
> > > > +do {								\
> > > > +	if (state)						\
> > > > +		rcu_irq_exit_irqsave();				\
> > > > +} while (0)
> > > > +  
> > > 
> > > Is there a reason that these can't be static __always_inline functions?  
> > 
> > It can be done, but then we need fwd declarations of those RCU functions
> > somewhere outside of rcupdate.h. It's all a bit of a mess.
> 
> Maybe this belongs in the rcupdate.h file then?

Possibly, and I suppose the current version is less obviously dependent
on the in_nmi() functionality as was the previous, seeing how Paul
frobbed that all the way into the rcu_irq_enter*() implementation.

So sure, I can go move it I suppose.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 05/22] rcu: Make RCU IRQ enter/exit functions rely on in_nmi()
  2020-02-19 16:31   ` Paul E. McKenney
@ 2020-02-19 16:37     ` Peter Zijlstra
  2020-02-19 16:45       ` Paul E. McKenney
  2020-02-19 17:03       ` Peter Zijlstra
  2020-02-19 17:16     ` [PATCH] rcu/kprobes: Comment why rcu_nmi_enter() is marked NOKPROBE Steven Rostedt
  1 sibling, 2 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 16:37 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 08:31:56AM -0800, Paul E. McKenney wrote:
> On Wed, Feb 19, 2020 at 03:47:29PM +0100, Peter Zijlstra wrote:
> > From: Paul E. McKenney <paulmck@kernel.org>
> > 
> > The rcu_nmi_enter_common() and rcu_nmi_exit_common() functions take an
> > "irq" parameter that indicates whether these functions are invoked from
> > an irq handler (irq==true) or an NMI handler (irq==false).  However,
> > recent changes have applied notrace to a few critical functions such
> > that rcu_nmi_enter_common() and rcu_nmi_exit_common() many now rely
> > on in_nmi().  Note that in_nmi() works no differently than before,
> > but rather that tracing is now prohibited in code regions where in_nmi()
> > would incorrectly report NMI state.
> > 
> > This commit therefore removes the "irq" parameter and inlines
> > rcu_nmi_enter_common() and rcu_nmi_exit_common() into rcu_nmi_enter()
> > and rcu_nmi_exit(), respectively.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> Again, thank you.
> 
> Would you like to also take the added comment for NOKPROBE_SYMBOL(),
> or would you prefer that I carry that separately?  (I dropped it for
> now to avoid the conflict with the patch below.)
> 
> Here is the latest version of that comment, posted by Steve Rostedt.
> 
> 							Thanx, Paul
> 
> /*
>  * All functions called in the breakpoint trap handler (e.g. do_int3()
>  * on x86), must not allow kprobes until the kprobe breakpoint handler
>  * is called, otherwise it can cause an infinite recursion.
>  * On some archs, rcu_nmi_enter() is called in the breakpoint handler
>  * before the kprobe breakpoint handler is called, thus it must be
>  * marked as NOKPROBE.
>  */

Oh right, let me stick that in a separate patch. Best we not loose that
I suppose ;-)

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 06/22] rcu: Rename rcu_irq_{enter,exit}_irqson()
  2020-02-19 14:47 ` [PATCH v3 06/22] rcu: Rename rcu_irq_{enter,exit}_irqson() Peter Zijlstra
@ 2020-02-19 16:38   ` Paul E. McKenney
  0 siblings, 0 replies; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 16:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 03:47:30PM +0100, Peter Zijlstra wrote:
> The functions do in fact use local_irq_{save,restore}() and can
> therefore be used when IRQs are in fact disabled. Worse, they are
> already used in places where IRQs are disabled, leading to great
> confusion when reading the code.
> 
> Rename them to fix this confusion.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

My first reaction was "Hey, wait, where is the _irqrestore()?"

Nevertheless, especially since these are the only _irqson() functions:

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  include/linux/rcupdate.h   |    4 ++--
>  include/linux/rcutiny.h    |    4 ++--
>  include/linux/rcutree.h    |    4 ++--
>  include/linux/tracepoint.h |    4 ++--
>  kernel/cpu_pm.c            |    4 ++--
>  kernel/rcu/tree.c          |    8 ++++----
>  kernel/trace/trace.c       |    4 ++--
>  7 files changed, 16 insertions(+), 16 deletions(-)
> 
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -120,9 +120,9 @@ static inline void rcu_init_nohz(void) {
>   */
>  #define RCU_NONIDLE(a) \
>  	do { \
> -		rcu_irq_enter_irqson(); \
> +		rcu_irq_enter_irqsave(); \
>  		do { a; } while (0); \
> -		rcu_irq_exit_irqson(); \
> +		rcu_irq_exit_irqsave(); \
>  	} while (0)
>  
>  /*
> --- a/include/linux/rcutiny.h
> +++ b/include/linux/rcutiny.h
> @@ -68,8 +68,8 @@ static inline int rcu_jiffies_till_stall
>  static inline void rcu_idle_enter(void) { }
>  static inline void rcu_idle_exit(void) { }
>  static inline void rcu_irq_enter(void) { }
> -static inline void rcu_irq_exit_irqson(void) { }
> -static inline void rcu_irq_enter_irqson(void) { }
> +static inline void rcu_irq_exit_irqsave(void) { }
> +static inline void rcu_irq_enter_irqsave(void) { }
>  static inline void rcu_irq_exit(void) { }
>  static inline void exit_rcu(void) { }
>  static inline bool rcu_preempt_need_deferred_qs(struct task_struct *t)
> --- a/include/linux/rcutree.h
> +++ b/include/linux/rcutree.h
> @@ -46,8 +46,8 @@ void rcu_idle_enter(void);
>  void rcu_idle_exit(void);
>  void rcu_irq_enter(void);
>  void rcu_irq_exit(void);
> -void rcu_irq_enter_irqson(void);
> -void rcu_irq_exit_irqson(void);
> +void rcu_irq_enter_irqsave(void);
> +void rcu_irq_exit_irqsave(void);
>  
>  void exit_rcu(void);
>  
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -181,7 +181,7 @@ static inline struct tracepoint *tracepo
>  		 */							\
>  		if (rcuidle) {						\
>  			__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
> -			rcu_irq_enter_irqson();				\
> +			rcu_irq_enter_irqsave();			\
>  		}							\
>  									\
>  		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
> @@ -195,7 +195,7 @@ static inline struct tracepoint *tracepo
>  		}							\
>  									\
>  		if (rcuidle) {						\
> -			rcu_irq_exit_irqson();				\
> +			rcu_irq_exit_irqsave();				\
>  			srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\
>  		}							\
>  									\
> --- a/kernel/cpu_pm.c
> +++ b/kernel/cpu_pm.c
> @@ -24,10 +24,10 @@ static int cpu_pm_notify(enum cpu_pm_eve
>  	 * could be disfunctional in cpu idle. Copy RCU_NONIDLE code to let
>  	 * RCU know this.
>  	 */
> -	rcu_irq_enter_irqson();
> +	rcu_irq_enter_irqsave();
>  	ret = __atomic_notifier_call_chain(&cpu_pm_notifier_chain, event, NULL,
>  		nr_to_call, nr_calls);
> -	rcu_irq_exit_irqson();
> +	rcu_irq_exit_irqsave();
>  
>  	return notifier_to_errno(ret);
>  }
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -699,10 +699,10 @@ void rcu_irq_exit(void)
>  /*
>   * Wrapper for rcu_irq_exit() where interrupts are enabled.
>   *
> - * If you add or remove a call to rcu_irq_exit_irqson(), be sure to test
> + * If you add or remove a call to rcu_irq_exit_irqsave(), be sure to test
>   * with CONFIG_RCU_EQS_DEBUG=y.
>   */
> -void rcu_irq_exit_irqson(void)
> +void rcu_irq_exit_irqsave(void)
>  {
>  	unsigned long flags;
>  
> @@ -875,10 +875,10 @@ void rcu_irq_enter(void)
>  /*
>   * Wrapper for rcu_irq_enter() where interrupts are enabled.
>   *
> - * If you add or remove a call to rcu_irq_enter_irqson(), be sure to test
> + * If you add or remove a call to rcu_irq_enter_irqsave(), be sure to test
>   * with CONFIG_RCU_EQS_DEBUG=y.
>   */
> -void rcu_irq_enter_irqson(void)
> +void rcu_irq_enter_irqsave(void)
>  {
>  	unsigned long flags;
>  
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -3004,9 +3004,9 @@ void __trace_stack(struct trace_array *t
>  	if (unlikely(in_nmi()))
>  		return;
>  
> -	rcu_irq_enter_irqson();
> +	rcu_irq_enter_irqsave();
>  	__ftrace_trace_stack(buffer, flags, skip, pc, NULL);
> -	rcu_irq_exit_irqson();
> +	rcu_irq_exit_irqsave();
>  }
>  
>  /**
> 
> 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 07/22] rcu: Mark rcu_dynticks_curr_cpu_in_eqs() inline
  2020-02-19 14:47 ` [PATCH v3 07/22] rcu: Mark rcu_dynticks_curr_cpu_in_eqs() inline Peter Zijlstra
@ 2020-02-19 16:39   ` Paul E. McKenney
  2020-02-19 17:19     ` Steven Rostedt
  0 siblings, 1 reply; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 16:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 03:47:31PM +0100, Peter Zijlstra wrote:
> Since rcu_is_watching() is notrace (and needs to be, as it can be
> called from the tracers), make sure everything it in turn calls is
> notrace too.
> 
> To that effect, mark rcu_dynticks_curr_cpu_in_eqs() inline, which
> implies notrace, as the function is tiny.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

There was some controversy over inline vs. notrace, leading me to
ask whether we should use both inline and notrace here.  ;-)

Assuming that the usual tracing suspects are OK with it:

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  kernel/rcu/tree.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -294,7 +294,7 @@ static void rcu_dynticks_eqs_online(void
>   *
>   * No ordering, as we are sampling CPU-local information.
>   */
> -static bool rcu_dynticks_curr_cpu_in_eqs(void)
> +static inline bool rcu_dynticks_curr_cpu_in_eqs(void)
>  {
>  	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
>  
> 
> 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again)
  2020-02-19 14:47 ` [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again) Peter Zijlstra
  2020-02-19 15:53   ` Steven Rostedt
@ 2020-02-19 16:43   ` Paul E. McKenney
  2020-02-19 16:47     ` Peter Zijlstra
  1 sibling, 1 reply; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 16:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 03:47:37PM +0100, Peter Zijlstra wrote:
> Effectively revert commit 865e63b04e9b2 ("tracing: Add back in
> rcu_irq_enter/exit_irqson() for rcuidle tracepoints") now that we've
> taught perf how to deal with not having an RCU context provided.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/tracepoint.h |    8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -179,10 +179,8 @@ static inline struct tracepoint *tracepo

Shouldn't we also get rid of this line above?

		int __maybe_unused __idx = 0;				\

							Thanx, Paul

>  		 * For rcuidle callers, use srcu since sched-rcu	\
>  		 * doesn't work from the idle path.			\
>  		 */							\
> -		if (rcuidle) {						\
> +		if (rcuidle)						\
>  			__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
> -			rcu_irq_enter_irqsave();			\
> -		}							\
>  									\
>  		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
>  									\
> @@ -194,10 +192,8 @@ static inline struct tracepoint *tracepo
>  			} while ((++it_func_ptr)->func);		\
>  		}							\
>  									\
> -		if (rcuidle) {						\
> -			rcu_irq_exit_irqsave();				\
> +		if (rcuidle)						\
>  			srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\
> -		}							\
>  									\
>  		preempt_enable_notrace();				\
>  	} while (0)
> 
> 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 08/22] rcu,tracing: Create trace_rcu_{enter,exit}()
  2020-02-19 16:35         ` Peter Zijlstra
@ 2020-02-19 16:44           ` Paul E. McKenney
  2020-02-20 10:34             ` Peter Zijlstra
  0 siblings, 1 reply; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 16:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 05:35:35PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 19, 2020 at 11:15:32AM -0500, Steven Rostedt wrote:
> > On Wed, 19 Feb 2020 16:58:28 +0100
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Wed, Feb 19, 2020 at 10:49:03AM -0500, Steven Rostedt wrote:
> > > > On Wed, 19 Feb 2020 15:47:32 +0100
> > > > Peter Zijlstra <peterz@infradead.org> wrote:  
> > > 
> > > > > These helpers are macros because of header-hell; they're placed here
> > > > > because of the proximity to nmi_{enter,exit{().  
> > > 
> > > ^^^^
> > 
> > Bah I can't read, because I even went looking for this!
> > 
> > > 
> > > > > +#define trace_rcu_enter()					\
> > > > > +({								\
> > > > > +	unsigned long state = 0;				\
> > > > > +	if (!rcu_is_watching())	{				\
> > > > > +		rcu_irq_enter_irqsave();			\
> > > > > +		state = 1;					\
> > > > > +	}							\
> > > > > +	state;							\
> > > > > +})
> > > > > +
> > > > > +#define trace_rcu_exit(state)					\
> > > > > +do {								\
> > > > > +	if (state)						\
> > > > > +		rcu_irq_exit_irqsave();				\
> > > > > +} while (0)
> > > > > +  
> > > > 
> > > > Is there a reason that these can't be static __always_inline functions?  
> > > 
> > > It can be done, but then we need fwd declarations of those RCU functions
> > > somewhere outside of rcupdate.h. It's all a bit of a mess.
> > 
> > Maybe this belongs in the rcupdate.h file then?
> 
> Possibly, and I suppose the current version is less obviously dependent
> on the in_nmi() functionality as was the previous, seeing how Paul
> frobbed that all the way into the rcu_irq_enter*() implementation.
> 
> So sure, I can go move it I suppose.

No objections here.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 05/22] rcu: Make RCU IRQ enter/exit functions rely on in_nmi()
  2020-02-19 16:37     ` Peter Zijlstra
@ 2020-02-19 16:45       ` Paul E. McKenney
  2020-02-19 17:03       ` Peter Zijlstra
  1 sibling, 0 replies; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 16:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 05:37:00PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 19, 2020 at 08:31:56AM -0800, Paul E. McKenney wrote:
> > On Wed, Feb 19, 2020 at 03:47:29PM +0100, Peter Zijlstra wrote:
> > > From: Paul E. McKenney <paulmck@kernel.org>
> > > 
> > > The rcu_nmi_enter_common() and rcu_nmi_exit_common() functions take an
> > > "irq" parameter that indicates whether these functions are invoked from
> > > an irq handler (irq==true) or an NMI handler (irq==false).  However,
> > > recent changes have applied notrace to a few critical functions such
> > > that rcu_nmi_enter_common() and rcu_nmi_exit_common() many now rely
> > > on in_nmi().  Note that in_nmi() works no differently than before,
> > > but rather that tracing is now prohibited in code regions where in_nmi()
> > > would incorrectly report NMI state.
> > > 
> > > This commit therefore removes the "irq" parameter and inlines
> > > rcu_nmi_enter_common() and rcu_nmi_exit_common() into rcu_nmi_enter()
> > > and rcu_nmi_exit(), respectively.
> > > 
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > 
> > Again, thank you.
> > 
> > Would you like to also take the added comment for NOKPROBE_SYMBOL(),
> > or would you prefer that I carry that separately?  (I dropped it for
> > now to avoid the conflict with the patch below.)
> > 
> > Here is the latest version of that comment, posted by Steve Rostedt.
> > 
> > 							Thanx, Paul
> > 
> > /*
> >  * All functions called in the breakpoint trap handler (e.g. do_int3()
> >  * on x86), must not allow kprobes until the kprobe breakpoint handler
> >  * is called, otherwise it can cause an infinite recursion.
> >  * On some archs, rcu_nmi_enter() is called in the breakpoint handler
> >  * before the kprobe breakpoint handler is called, thus it must be
> >  * marked as NOKPROBE.
> >  */
> 
> Oh right, let me stick that in a separate patch. Best we not loose that
> I suppose ;-)

There was a lot of effort spent on it, to be sure.  ;-) ;-) ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE
  2020-02-19 16:34               ` Peter Zijlstra
@ 2020-02-19 16:46                 ` Paul E. McKenney
  0 siblings, 0 replies; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 16:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 05:34:09PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 19, 2020 at 08:27:47AM -0800, Paul E. McKenney wrote:
> > On Wed, Feb 19, 2020 at 11:12:28AM -0500, Steven Rostedt wrote:
> > > On Wed, 19 Feb 2020 17:04:42 +0100
> > > Peter Zijlstra <peterz@infradead.org> wrote:
> > > 
> > > > > -		memmove(&gpregs->ip, (void *)regs->sp, 5*8);
> > > > > +		for (i = 0; i < count; i++) {
> > > > > +			int idx = (dst <= src) ? i : count - i;  
> > > > 
> > > > That's an off-by-one for going backward; 'count - 1 - i' should work
> > > > better, or I should just stop typing for today ;-)
> > > 
> > > Or, we could just cut and paste the current memmove and make a notrace
> > > version too. Then we don't need to worry bout bugs like this.
> > 
> > OK, I will bite...
> > 
> > Can we just make the core be an inline function and make a notrace and
> > a trace caller?  Possibly going one step further and having one call
> > the other?  (Presumably the traceable version invoking the notrace
> > version, but it has been one good long time since I have looked at
> > function preambles.)
> 
> One complication is that GCC (and others) are prone to stick their own
> implementation of memmove() (and other string functions) in at 'random'.
> That is, it is up to the compiler's discretion wether or not to put a
> call to memmove() in or just emit some random giberish they feel has the
> same effect.
> 
> So if we go play silly games like that, we need be careful (or just call
> __memmove I suppose, which is supposed to avoid that IIRC).

Urgh, good point.  :-/

							Thanx, Paul

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again)
  2020-02-19 16:43   ` Paul E. McKenney
@ 2020-02-19 16:47     ` Peter Zijlstra
  2020-02-19 17:05       ` Peter Zijlstra
  0 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 16:47 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 08:43:56AM -0800, Paul E. McKenney wrote:
> On Wed, Feb 19, 2020 at 03:47:37PM +0100, Peter Zijlstra wrote:
> > Effectively revert commit 865e63b04e9b2 ("tracing: Add back in
> > rcu_irq_enter/exit_irqson() for rcuidle tracepoints") now that we've
> > taught perf how to deal with not having an RCU context provided.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  include/linux/tracepoint.h |    8 ++------
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> > 
> > --- a/include/linux/tracepoint.h
> > +++ b/include/linux/tracepoint.h
> > @@ -179,10 +179,8 @@ static inline struct tracepoint *tracepo
> 
> Shouldn't we also get rid of this line above?
> 
> 		int __maybe_unused __idx = 0;				\
> 

Probably makes a lot of sense, lemme fix that!

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 16/22] locking/atomics, kcsan: Add KCSAN instrumentation
  2020-02-19 16:03     ` Peter Zijlstra
@ 2020-02-19 16:50       ` Paul E. McKenney
  2020-02-19 16:54         ` Peter Zijlstra
  0 siblings, 1 reply; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 16:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat, Marco Elver,
	Mark Rutland

On Wed, Feb 19, 2020 at 05:03:18PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 19, 2020 at 10:46:26AM -0500, Steven Rostedt wrote:
> > On Wed, 19 Feb 2020 15:47:40 +0100
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > From: Marco Elver <elver@google.com>
> > > 
> > > This adds KCSAN instrumentation to atomic-instrumented.h.
> > > 
> > > Signed-off-by: Marco Elver <elver@google.com>
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > [peterz: removed the actual kcsan hooks]
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> > > ---
> > >  include/asm-generic/atomic-instrumented.h |  390 +++++++++++++++---------------
> > >  scripts/atomic/gen-atomic-instrumented.sh |   14 -
> > >  2 files changed, 212 insertions(+), 192 deletions(-)
> > > 
> > 
> > 
> > Does this and the rest of the series depend on the previous patches in
> > the series? Or can this be a series on to itself (patches 16-22)?
> 
> It can probably stand on its own, but it very much is related in so far
> that it's fallout from staring at all this nonsense.
> 
> Without these the do_int3() can actually have accidental tracing before
> reaching it's nmi_enter().

The original is already in -tip, so some merge magic will be required.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 22/22] x86/int3: Ensure that poke_int3_handler() is not sanitized
  2020-02-19 16:30     ` Peter Zijlstra
@ 2020-02-19 16:51       ` Peter Zijlstra
  2020-02-19 17:20       ` Peter Zijlstra
  1 sibling, 0 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 16:51 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: LKML, linux-arch, Steven Rostedt, Ingo Molnar, Joel Fernandes,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Andy Lutomirski, tony.luck, Frederic Weisbecker,
	Dan Carpenter, Masami Hiramatsu, Andrey Ryabinin, kasan-dev

On Wed, Feb 19, 2020 at 05:30:25PM +0100, Peter Zijlstra wrote:
> > It's quite fragile. Tomorrow poke_int3_handler handler calls more of
> > fewer functions, and both ways it's not detected by anything.
> 
> Yes; not having tools for this is pretty annoying. In 0/n I asked Dan if
> smatch could do at least the normal tracing stuff, the compiler
> instrumentation bits are going to be far more difficult because smatch
> doesn't work at that level :/
> 
> (I actually have

... and I stopped typing ...

I think I mean to say something like: ... more changes to
poke_int3_handler() pending, but they're all quite simple).

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 16/22] locking/atomics, kcsan: Add KCSAN instrumentation
  2020-02-19 16:50       ` Paul E. McKenney
@ 2020-02-19 16:54         ` Peter Zijlstra
  2020-02-19 17:36           ` Paul E. McKenney
  0 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 16:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat, Marco Elver,
	Mark Rutland

On Wed, Feb 19, 2020 at 08:50:20AM -0800, Paul E. McKenney wrote:
> On Wed, Feb 19, 2020 at 05:03:18PM +0100, Peter Zijlstra wrote:
> > On Wed, Feb 19, 2020 at 10:46:26AM -0500, Steven Rostedt wrote:
> > > On Wed, 19 Feb 2020 15:47:40 +0100
> > > Peter Zijlstra <peterz@infradead.org> wrote:
> > > 
> > > > From: Marco Elver <elver@google.com>
> > > > 
> > > > This adds KCSAN instrumentation to atomic-instrumented.h.
> > > > 
> > > > Signed-off-by: Marco Elver <elver@google.com>
> > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > [peterz: removed the actual kcsan hooks]
> > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> > > > ---
> > > >  include/asm-generic/atomic-instrumented.h |  390 +++++++++++++++---------------
> > > >  scripts/atomic/gen-atomic-instrumented.sh |   14 -
> > > >  2 files changed, 212 insertions(+), 192 deletions(-)
> > > > 
> > > 
> > > 
> > > Does this and the rest of the series depend on the previous patches in
> > > the series? Or can this be a series on to itself (patches 16-22)?
> > 
> > It can probably stand on its own, but it very much is related in so far
> > that it's fallout from staring at all this nonsense.
> > 
> > Without these the do_int3() can actually have accidental tracing before
> > reaching it's nmi_enter().
> 
> The original is already in -tip, so some merge magic will be required.

Yes, So I don't strictly need this one, but I do need the two next
patches adding __always_inline to everything. I figured it was easier to
also pick this one (and butcher it) than to rebase everything.

I didn't want to depend on the locking/kcsan tree, and if this goes in,
we do have to do something 'funny' there. Maybe rebase, maybe put in a
few kcsan stubs so the original patch at least compiles. We'll see :/

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 18/22] asm-generic/atomic: Use __always_inline for fallback wrappers
  2020-02-19 14:47 ` [PATCH v3 18/22] asm-generic/atomic: Use __always_inline for fallback wrappers Peter Zijlstra
@ 2020-02-19 16:55   ` Paul E. McKenney
  2020-02-19 17:06     ` Peter Zijlstra
  0 siblings, 1 reply; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 16:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat, Mark Rutland, Marco Elver

On Wed, Feb 19, 2020 at 03:47:42PM +0100, Peter Zijlstra wrote:
> While the fallback wrappers aren't pure wrappers, they are trivial
> nonetheless, and the function they wrap should determine the final
> inlining policy.
> 
> For x86 tinyconfig we observe:
>  - vmlinux baseline: 1315988
>  - vmlinux with patch: 1315928 (-60 bytes)
> 
> Suggested-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Marco Elver <elver@google.com>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

And this one and the previous one are also already in -tip, FYI.

							Thanx, Paul

> ---
> diff --git a/include/linux/atomic-fallback.h b/include/linux/atomic-fallback.h
> index a7d240e465c0..656b5489b673 100644
> --- a/include/linux/atomic-fallback.h
> +++ b/include/linux/atomic-fallback.h
> @@ -6,6 +6,8 @@
>  #ifndef _LINUX_ATOMIC_FALLBACK_H
>  #define _LINUX_ATOMIC_FALLBACK_H
>  
> +#include <linux/compiler.h>
> +
>  #ifndef xchg_relaxed
>  #define xchg_relaxed		xchg
>  #define xchg_acquire		xchg
> @@ -76,7 +78,7 @@
>  #endif /* cmpxchg64_relaxed */
>  
>  #ifndef atomic_read_acquire
> -static inline int
> +static __always_inline int
>  atomic_read_acquire(const atomic_t *v)
>  {
>  	return smp_load_acquire(&(v)->counter);
> @@ -85,7 +87,7 @@ atomic_read_acquire(const atomic_t *v)
>  #endif
>  
>  #ifndef atomic_set_release
> -static inline void
> +static __always_inline void
>  atomic_set_release(atomic_t *v, int i)
>  {
>  	smp_store_release(&(v)->counter, i);
> @@ -100,7 +102,7 @@ atomic_set_release(atomic_t *v, int i)
>  #else /* atomic_add_return_relaxed */
>  
>  #ifndef atomic_add_return_acquire
> -static inline int
> +static __always_inline int
>  atomic_add_return_acquire(int i, atomic_t *v)
>  {
>  	int ret = atomic_add_return_relaxed(i, v);
> @@ -111,7 +113,7 @@ atomic_add_return_acquire(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_add_return_release
> -static inline int
> +static __always_inline int
>  atomic_add_return_release(int i, atomic_t *v)
>  {
>  	__atomic_release_fence();
> @@ -121,7 +123,7 @@ atomic_add_return_release(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_add_return
> -static inline int
> +static __always_inline int
>  atomic_add_return(int i, atomic_t *v)
>  {
>  	int ret;
> @@ -142,7 +144,7 @@ atomic_add_return(int i, atomic_t *v)
>  #else /* atomic_fetch_add_relaxed */
>  
>  #ifndef atomic_fetch_add_acquire
> -static inline int
> +static __always_inline int
>  atomic_fetch_add_acquire(int i, atomic_t *v)
>  {
>  	int ret = atomic_fetch_add_relaxed(i, v);
> @@ -153,7 +155,7 @@ atomic_fetch_add_acquire(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_add_release
> -static inline int
> +static __always_inline int
>  atomic_fetch_add_release(int i, atomic_t *v)
>  {
>  	__atomic_release_fence();
> @@ -163,7 +165,7 @@ atomic_fetch_add_release(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_add
> -static inline int
> +static __always_inline int
>  atomic_fetch_add(int i, atomic_t *v)
>  {
>  	int ret;
> @@ -184,7 +186,7 @@ atomic_fetch_add(int i, atomic_t *v)
>  #else /* atomic_sub_return_relaxed */
>  
>  #ifndef atomic_sub_return_acquire
> -static inline int
> +static __always_inline int
>  atomic_sub_return_acquire(int i, atomic_t *v)
>  {
>  	int ret = atomic_sub_return_relaxed(i, v);
> @@ -195,7 +197,7 @@ atomic_sub_return_acquire(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_sub_return_release
> -static inline int
> +static __always_inline int
>  atomic_sub_return_release(int i, atomic_t *v)
>  {
>  	__atomic_release_fence();
> @@ -205,7 +207,7 @@ atomic_sub_return_release(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_sub_return
> -static inline int
> +static __always_inline int
>  atomic_sub_return(int i, atomic_t *v)
>  {
>  	int ret;
> @@ -226,7 +228,7 @@ atomic_sub_return(int i, atomic_t *v)
>  #else /* atomic_fetch_sub_relaxed */
>  
>  #ifndef atomic_fetch_sub_acquire
> -static inline int
> +static __always_inline int
>  atomic_fetch_sub_acquire(int i, atomic_t *v)
>  {
>  	int ret = atomic_fetch_sub_relaxed(i, v);
> @@ -237,7 +239,7 @@ atomic_fetch_sub_acquire(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_sub_release
> -static inline int
> +static __always_inline int
>  atomic_fetch_sub_release(int i, atomic_t *v)
>  {
>  	__atomic_release_fence();
> @@ -247,7 +249,7 @@ atomic_fetch_sub_release(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_sub
> -static inline int
> +static __always_inline int
>  atomic_fetch_sub(int i, atomic_t *v)
>  {
>  	int ret;
> @@ -262,7 +264,7 @@ atomic_fetch_sub(int i, atomic_t *v)
>  #endif /* atomic_fetch_sub_relaxed */
>  
>  #ifndef atomic_inc
> -static inline void
> +static __always_inline void
>  atomic_inc(atomic_t *v)
>  {
>  	atomic_add(1, v);
> @@ -278,7 +280,7 @@ atomic_inc(atomic_t *v)
>  #endif /* atomic_inc_return */
>  
>  #ifndef atomic_inc_return
> -static inline int
> +static __always_inline int
>  atomic_inc_return(atomic_t *v)
>  {
>  	return atomic_add_return(1, v);
> @@ -287,7 +289,7 @@ atomic_inc_return(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_inc_return_acquire
> -static inline int
> +static __always_inline int
>  atomic_inc_return_acquire(atomic_t *v)
>  {
>  	return atomic_add_return_acquire(1, v);
> @@ -296,7 +298,7 @@ atomic_inc_return_acquire(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_inc_return_release
> -static inline int
> +static __always_inline int
>  atomic_inc_return_release(atomic_t *v)
>  {
>  	return atomic_add_return_release(1, v);
> @@ -305,7 +307,7 @@ atomic_inc_return_release(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_inc_return_relaxed
> -static inline int
> +static __always_inline int
>  atomic_inc_return_relaxed(atomic_t *v)
>  {
>  	return atomic_add_return_relaxed(1, v);
> @@ -316,7 +318,7 @@ atomic_inc_return_relaxed(atomic_t *v)
>  #else /* atomic_inc_return_relaxed */
>  
>  #ifndef atomic_inc_return_acquire
> -static inline int
> +static __always_inline int
>  atomic_inc_return_acquire(atomic_t *v)
>  {
>  	int ret = atomic_inc_return_relaxed(v);
> @@ -327,7 +329,7 @@ atomic_inc_return_acquire(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_inc_return_release
> -static inline int
> +static __always_inline int
>  atomic_inc_return_release(atomic_t *v)
>  {
>  	__atomic_release_fence();
> @@ -337,7 +339,7 @@ atomic_inc_return_release(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_inc_return
> -static inline int
> +static __always_inline int
>  atomic_inc_return(atomic_t *v)
>  {
>  	int ret;
> @@ -359,7 +361,7 @@ atomic_inc_return(atomic_t *v)
>  #endif /* atomic_fetch_inc */
>  
>  #ifndef atomic_fetch_inc
> -static inline int
> +static __always_inline int
>  atomic_fetch_inc(atomic_t *v)
>  {
>  	return atomic_fetch_add(1, v);
> @@ -368,7 +370,7 @@ atomic_fetch_inc(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_inc_acquire
> -static inline int
> +static __always_inline int
>  atomic_fetch_inc_acquire(atomic_t *v)
>  {
>  	return atomic_fetch_add_acquire(1, v);
> @@ -377,7 +379,7 @@ atomic_fetch_inc_acquire(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_inc_release
> -static inline int
> +static __always_inline int
>  atomic_fetch_inc_release(atomic_t *v)
>  {
>  	return atomic_fetch_add_release(1, v);
> @@ -386,7 +388,7 @@ atomic_fetch_inc_release(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_inc_relaxed
> -static inline int
> +static __always_inline int
>  atomic_fetch_inc_relaxed(atomic_t *v)
>  {
>  	return atomic_fetch_add_relaxed(1, v);
> @@ -397,7 +399,7 @@ atomic_fetch_inc_relaxed(atomic_t *v)
>  #else /* atomic_fetch_inc_relaxed */
>  
>  #ifndef atomic_fetch_inc_acquire
> -static inline int
> +static __always_inline int
>  atomic_fetch_inc_acquire(atomic_t *v)
>  {
>  	int ret = atomic_fetch_inc_relaxed(v);
> @@ -408,7 +410,7 @@ atomic_fetch_inc_acquire(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_inc_release
> -static inline int
> +static __always_inline int
>  atomic_fetch_inc_release(atomic_t *v)
>  {
>  	__atomic_release_fence();
> @@ -418,7 +420,7 @@ atomic_fetch_inc_release(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_inc
> -static inline int
> +static __always_inline int
>  atomic_fetch_inc(atomic_t *v)
>  {
>  	int ret;
> @@ -433,7 +435,7 @@ atomic_fetch_inc(atomic_t *v)
>  #endif /* atomic_fetch_inc_relaxed */
>  
>  #ifndef atomic_dec
> -static inline void
> +static __always_inline void
>  atomic_dec(atomic_t *v)
>  {
>  	atomic_sub(1, v);
> @@ -449,7 +451,7 @@ atomic_dec(atomic_t *v)
>  #endif /* atomic_dec_return */
>  
>  #ifndef atomic_dec_return
> -static inline int
> +static __always_inline int
>  atomic_dec_return(atomic_t *v)
>  {
>  	return atomic_sub_return(1, v);
> @@ -458,7 +460,7 @@ atomic_dec_return(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_dec_return_acquire
> -static inline int
> +static __always_inline int
>  atomic_dec_return_acquire(atomic_t *v)
>  {
>  	return atomic_sub_return_acquire(1, v);
> @@ -467,7 +469,7 @@ atomic_dec_return_acquire(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_dec_return_release
> -static inline int
> +static __always_inline int
>  atomic_dec_return_release(atomic_t *v)
>  {
>  	return atomic_sub_return_release(1, v);
> @@ -476,7 +478,7 @@ atomic_dec_return_release(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_dec_return_relaxed
> -static inline int
> +static __always_inline int
>  atomic_dec_return_relaxed(atomic_t *v)
>  {
>  	return atomic_sub_return_relaxed(1, v);
> @@ -487,7 +489,7 @@ atomic_dec_return_relaxed(atomic_t *v)
>  #else /* atomic_dec_return_relaxed */
>  
>  #ifndef atomic_dec_return_acquire
> -static inline int
> +static __always_inline int
>  atomic_dec_return_acquire(atomic_t *v)
>  {
>  	int ret = atomic_dec_return_relaxed(v);
> @@ -498,7 +500,7 @@ atomic_dec_return_acquire(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_dec_return_release
> -static inline int
> +static __always_inline int
>  atomic_dec_return_release(atomic_t *v)
>  {
>  	__atomic_release_fence();
> @@ -508,7 +510,7 @@ atomic_dec_return_release(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_dec_return
> -static inline int
> +static __always_inline int
>  atomic_dec_return(atomic_t *v)
>  {
>  	int ret;
> @@ -530,7 +532,7 @@ atomic_dec_return(atomic_t *v)
>  #endif /* atomic_fetch_dec */
>  
>  #ifndef atomic_fetch_dec
> -static inline int
> +static __always_inline int
>  atomic_fetch_dec(atomic_t *v)
>  {
>  	return atomic_fetch_sub(1, v);
> @@ -539,7 +541,7 @@ atomic_fetch_dec(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_dec_acquire
> -static inline int
> +static __always_inline int
>  atomic_fetch_dec_acquire(atomic_t *v)
>  {
>  	return atomic_fetch_sub_acquire(1, v);
> @@ -548,7 +550,7 @@ atomic_fetch_dec_acquire(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_dec_release
> -static inline int
> +static __always_inline int
>  atomic_fetch_dec_release(atomic_t *v)
>  {
>  	return atomic_fetch_sub_release(1, v);
> @@ -557,7 +559,7 @@ atomic_fetch_dec_release(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_dec_relaxed
> -static inline int
> +static __always_inline int
>  atomic_fetch_dec_relaxed(atomic_t *v)
>  {
>  	return atomic_fetch_sub_relaxed(1, v);
> @@ -568,7 +570,7 @@ atomic_fetch_dec_relaxed(atomic_t *v)
>  #else /* atomic_fetch_dec_relaxed */
>  
>  #ifndef atomic_fetch_dec_acquire
> -static inline int
> +static __always_inline int
>  atomic_fetch_dec_acquire(atomic_t *v)
>  {
>  	int ret = atomic_fetch_dec_relaxed(v);
> @@ -579,7 +581,7 @@ atomic_fetch_dec_acquire(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_dec_release
> -static inline int
> +static __always_inline int
>  atomic_fetch_dec_release(atomic_t *v)
>  {
>  	__atomic_release_fence();
> @@ -589,7 +591,7 @@ atomic_fetch_dec_release(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_dec
> -static inline int
> +static __always_inline int
>  atomic_fetch_dec(atomic_t *v)
>  {
>  	int ret;
> @@ -610,7 +612,7 @@ atomic_fetch_dec(atomic_t *v)
>  #else /* atomic_fetch_and_relaxed */
>  
>  #ifndef atomic_fetch_and_acquire
> -static inline int
> +static __always_inline int
>  atomic_fetch_and_acquire(int i, atomic_t *v)
>  {
>  	int ret = atomic_fetch_and_relaxed(i, v);
> @@ -621,7 +623,7 @@ atomic_fetch_and_acquire(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_and_release
> -static inline int
> +static __always_inline int
>  atomic_fetch_and_release(int i, atomic_t *v)
>  {
>  	__atomic_release_fence();
> @@ -631,7 +633,7 @@ atomic_fetch_and_release(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_and
> -static inline int
> +static __always_inline int
>  atomic_fetch_and(int i, atomic_t *v)
>  {
>  	int ret;
> @@ -646,7 +648,7 @@ atomic_fetch_and(int i, atomic_t *v)
>  #endif /* atomic_fetch_and_relaxed */
>  
>  #ifndef atomic_andnot
> -static inline void
> +static __always_inline void
>  atomic_andnot(int i, atomic_t *v)
>  {
>  	atomic_and(~i, v);
> @@ -662,7 +664,7 @@ atomic_andnot(int i, atomic_t *v)
>  #endif /* atomic_fetch_andnot */
>  
>  #ifndef atomic_fetch_andnot
> -static inline int
> +static __always_inline int
>  atomic_fetch_andnot(int i, atomic_t *v)
>  {
>  	return atomic_fetch_and(~i, v);
> @@ -671,7 +673,7 @@ atomic_fetch_andnot(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_andnot_acquire
> -static inline int
> +static __always_inline int
>  atomic_fetch_andnot_acquire(int i, atomic_t *v)
>  {
>  	return atomic_fetch_and_acquire(~i, v);
> @@ -680,7 +682,7 @@ atomic_fetch_andnot_acquire(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_andnot_release
> -static inline int
> +static __always_inline int
>  atomic_fetch_andnot_release(int i, atomic_t *v)
>  {
>  	return atomic_fetch_and_release(~i, v);
> @@ -689,7 +691,7 @@ atomic_fetch_andnot_release(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_andnot_relaxed
> -static inline int
> +static __always_inline int
>  atomic_fetch_andnot_relaxed(int i, atomic_t *v)
>  {
>  	return atomic_fetch_and_relaxed(~i, v);
> @@ -700,7 +702,7 @@ atomic_fetch_andnot_relaxed(int i, atomic_t *v)
>  #else /* atomic_fetch_andnot_relaxed */
>  
>  #ifndef atomic_fetch_andnot_acquire
> -static inline int
> +static __always_inline int
>  atomic_fetch_andnot_acquire(int i, atomic_t *v)
>  {
>  	int ret = atomic_fetch_andnot_relaxed(i, v);
> @@ -711,7 +713,7 @@ atomic_fetch_andnot_acquire(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_andnot_release
> -static inline int
> +static __always_inline int
>  atomic_fetch_andnot_release(int i, atomic_t *v)
>  {
>  	__atomic_release_fence();
> @@ -721,7 +723,7 @@ atomic_fetch_andnot_release(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_andnot
> -static inline int
> +static __always_inline int
>  atomic_fetch_andnot(int i, atomic_t *v)
>  {
>  	int ret;
> @@ -742,7 +744,7 @@ atomic_fetch_andnot(int i, atomic_t *v)
>  #else /* atomic_fetch_or_relaxed */
>  
>  #ifndef atomic_fetch_or_acquire
> -static inline int
> +static __always_inline int
>  atomic_fetch_or_acquire(int i, atomic_t *v)
>  {
>  	int ret = atomic_fetch_or_relaxed(i, v);
> @@ -753,7 +755,7 @@ atomic_fetch_or_acquire(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_or_release
> -static inline int
> +static __always_inline int
>  atomic_fetch_or_release(int i, atomic_t *v)
>  {
>  	__atomic_release_fence();
> @@ -763,7 +765,7 @@ atomic_fetch_or_release(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_or
> -static inline int
> +static __always_inline int
>  atomic_fetch_or(int i, atomic_t *v)
>  {
>  	int ret;
> @@ -784,7 +786,7 @@ atomic_fetch_or(int i, atomic_t *v)
>  #else /* atomic_fetch_xor_relaxed */
>  
>  #ifndef atomic_fetch_xor_acquire
> -static inline int
> +static __always_inline int
>  atomic_fetch_xor_acquire(int i, atomic_t *v)
>  {
>  	int ret = atomic_fetch_xor_relaxed(i, v);
> @@ -795,7 +797,7 @@ atomic_fetch_xor_acquire(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_xor_release
> -static inline int
> +static __always_inline int
>  atomic_fetch_xor_release(int i, atomic_t *v)
>  {
>  	__atomic_release_fence();
> @@ -805,7 +807,7 @@ atomic_fetch_xor_release(int i, atomic_t *v)
>  #endif
>  
>  #ifndef atomic_fetch_xor
> -static inline int
> +static __always_inline int
>  atomic_fetch_xor(int i, atomic_t *v)
>  {
>  	int ret;
> @@ -826,7 +828,7 @@ atomic_fetch_xor(int i, atomic_t *v)
>  #else /* atomic_xchg_relaxed */
>  
>  #ifndef atomic_xchg_acquire
> -static inline int
> +static __always_inline int
>  atomic_xchg_acquire(atomic_t *v, int i)
>  {
>  	int ret = atomic_xchg_relaxed(v, i);
> @@ -837,7 +839,7 @@ atomic_xchg_acquire(atomic_t *v, int i)
>  #endif
>  
>  #ifndef atomic_xchg_release
> -static inline int
> +static __always_inline int
>  atomic_xchg_release(atomic_t *v, int i)
>  {
>  	__atomic_release_fence();
> @@ -847,7 +849,7 @@ atomic_xchg_release(atomic_t *v, int i)
>  #endif
>  
>  #ifndef atomic_xchg
> -static inline int
> +static __always_inline int
>  atomic_xchg(atomic_t *v, int i)
>  {
>  	int ret;
> @@ -868,7 +870,7 @@ atomic_xchg(atomic_t *v, int i)
>  #else /* atomic_cmpxchg_relaxed */
>  
>  #ifndef atomic_cmpxchg_acquire
> -static inline int
> +static __always_inline int
>  atomic_cmpxchg_acquire(atomic_t *v, int old, int new)
>  {
>  	int ret = atomic_cmpxchg_relaxed(v, old, new);
> @@ -879,7 +881,7 @@ atomic_cmpxchg_acquire(atomic_t *v, int old, int new)
>  #endif
>  
>  #ifndef atomic_cmpxchg_release
> -static inline int
> +static __always_inline int
>  atomic_cmpxchg_release(atomic_t *v, int old, int new)
>  {
>  	__atomic_release_fence();
> @@ -889,7 +891,7 @@ atomic_cmpxchg_release(atomic_t *v, int old, int new)
>  #endif
>  
>  #ifndef atomic_cmpxchg
> -static inline int
> +static __always_inline int
>  atomic_cmpxchg(atomic_t *v, int old, int new)
>  {
>  	int ret;
> @@ -911,7 +913,7 @@ atomic_cmpxchg(atomic_t *v, int old, int new)
>  #endif /* atomic_try_cmpxchg */
>  
>  #ifndef atomic_try_cmpxchg
> -static inline bool
> +static __always_inline bool
>  atomic_try_cmpxchg(atomic_t *v, int *old, int new)
>  {
>  	int r, o = *old;
> @@ -924,7 +926,7 @@ atomic_try_cmpxchg(atomic_t *v, int *old, int new)
>  #endif
>  
>  #ifndef atomic_try_cmpxchg_acquire
> -static inline bool
> +static __always_inline bool
>  atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
>  {
>  	int r, o = *old;
> @@ -937,7 +939,7 @@ atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
>  #endif
>  
>  #ifndef atomic_try_cmpxchg_release
> -static inline bool
> +static __always_inline bool
>  atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
>  {
>  	int r, o = *old;
> @@ -950,7 +952,7 @@ atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
>  #endif
>  
>  #ifndef atomic_try_cmpxchg_relaxed
> -static inline bool
> +static __always_inline bool
>  atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
>  {
>  	int r, o = *old;
> @@ -965,7 +967,7 @@ atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
>  #else /* atomic_try_cmpxchg_relaxed */
>  
>  #ifndef atomic_try_cmpxchg_acquire
> -static inline bool
> +static __always_inline bool
>  atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
>  {
>  	bool ret = atomic_try_cmpxchg_relaxed(v, old, new);
> @@ -976,7 +978,7 @@ atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
>  #endif
>  
>  #ifndef atomic_try_cmpxchg_release
> -static inline bool
> +static __always_inline bool
>  atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
>  {
>  	__atomic_release_fence();
> @@ -986,7 +988,7 @@ atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
>  #endif
>  
>  #ifndef atomic_try_cmpxchg
> -static inline bool
> +static __always_inline bool
>  atomic_try_cmpxchg(atomic_t *v, int *old, int new)
>  {
>  	bool ret;
> @@ -1010,7 +1012,7 @@ atomic_try_cmpxchg(atomic_t *v, int *old, int new)
>   * true if the result is zero, or false for all
>   * other cases.
>   */
> -static inline bool
> +static __always_inline bool
>  atomic_sub_and_test(int i, atomic_t *v)
>  {
>  	return atomic_sub_return(i, v) == 0;
> @@ -1027,7 +1029,7 @@ atomic_sub_and_test(int i, atomic_t *v)
>   * returns true if the result is 0, or false for all other
>   * cases.
>   */
> -static inline bool
> +static __always_inline bool
>  atomic_dec_and_test(atomic_t *v)
>  {
>  	return atomic_dec_return(v) == 0;
> @@ -1044,7 +1046,7 @@ atomic_dec_and_test(atomic_t *v)
>   * and returns true if the result is zero, or false for all
>   * other cases.
>   */
> -static inline bool
> +static __always_inline bool
>  atomic_inc_and_test(atomic_t *v)
>  {
>  	return atomic_inc_return(v) == 0;
> @@ -1062,7 +1064,7 @@ atomic_inc_and_test(atomic_t *v)
>   * if the result is negative, or false when
>   * result is greater than or equal to zero.
>   */
> -static inline bool
> +static __always_inline bool
>  atomic_add_negative(int i, atomic_t *v)
>  {
>  	return atomic_add_return(i, v) < 0;
> @@ -1080,7 +1082,7 @@ atomic_add_negative(int i, atomic_t *v)
>   * Atomically adds @a to @v, so long as @v was not already @u.
>   * Returns original value of @v
>   */
> -static inline int
> +static __always_inline int
>  atomic_fetch_add_unless(atomic_t *v, int a, int u)
>  {
>  	int c = atomic_read(v);
> @@ -1105,7 +1107,7 @@ atomic_fetch_add_unless(atomic_t *v, int a, int u)
>   * Atomically adds @a to @v, if @v was not already @u.
>   * Returns true if the addition was done.
>   */
> -static inline bool
> +static __always_inline bool
>  atomic_add_unless(atomic_t *v, int a, int u)
>  {
>  	return atomic_fetch_add_unless(v, a, u) != u;
> @@ -1121,7 +1123,7 @@ atomic_add_unless(atomic_t *v, int a, int u)
>   * Atomically increments @v by 1, if @v is non-zero.
>   * Returns true if the increment was done.
>   */
> -static inline bool
> +static __always_inline bool
>  atomic_inc_not_zero(atomic_t *v)
>  {
>  	return atomic_add_unless(v, 1, 0);
> @@ -1130,7 +1132,7 @@ atomic_inc_not_zero(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_inc_unless_negative
> -static inline bool
> +static __always_inline bool
>  atomic_inc_unless_negative(atomic_t *v)
>  {
>  	int c = atomic_read(v);
> @@ -1146,7 +1148,7 @@ atomic_inc_unless_negative(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_dec_unless_positive
> -static inline bool
> +static __always_inline bool
>  atomic_dec_unless_positive(atomic_t *v)
>  {
>  	int c = atomic_read(v);
> @@ -1162,7 +1164,7 @@ atomic_dec_unless_positive(atomic_t *v)
>  #endif
>  
>  #ifndef atomic_dec_if_positive
> -static inline int
> +static __always_inline int
>  atomic_dec_if_positive(atomic_t *v)
>  {
>  	int dec, c = atomic_read(v);
> @@ -1186,7 +1188,7 @@ atomic_dec_if_positive(atomic_t *v)
>  #endif
>  
>  #ifndef atomic64_read_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_read_acquire(const atomic64_t *v)
>  {
>  	return smp_load_acquire(&(v)->counter);
> @@ -1195,7 +1197,7 @@ atomic64_read_acquire(const atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_set_release
> -static inline void
> +static __always_inline void
>  atomic64_set_release(atomic64_t *v, s64 i)
>  {
>  	smp_store_release(&(v)->counter, i);
> @@ -1210,7 +1212,7 @@ atomic64_set_release(atomic64_t *v, s64 i)
>  #else /* atomic64_add_return_relaxed */
>  
>  #ifndef atomic64_add_return_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_add_return_acquire(s64 i, atomic64_t *v)
>  {
>  	s64 ret = atomic64_add_return_relaxed(i, v);
> @@ -1221,7 +1223,7 @@ atomic64_add_return_acquire(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_add_return_release
> -static inline s64
> +static __always_inline s64
>  atomic64_add_return_release(s64 i, atomic64_t *v)
>  {
>  	__atomic_release_fence();
> @@ -1231,7 +1233,7 @@ atomic64_add_return_release(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_add_return
> -static inline s64
> +static __always_inline s64
>  atomic64_add_return(s64 i, atomic64_t *v)
>  {
>  	s64 ret;
> @@ -1252,7 +1254,7 @@ atomic64_add_return(s64 i, atomic64_t *v)
>  #else /* atomic64_fetch_add_relaxed */
>  
>  #ifndef atomic64_fetch_add_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_add_acquire(s64 i, atomic64_t *v)
>  {
>  	s64 ret = atomic64_fetch_add_relaxed(i, v);
> @@ -1263,7 +1265,7 @@ atomic64_fetch_add_acquire(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_add_release
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_add_release(s64 i, atomic64_t *v)
>  {
>  	__atomic_release_fence();
> @@ -1273,7 +1275,7 @@ atomic64_fetch_add_release(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_add
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_add(s64 i, atomic64_t *v)
>  {
>  	s64 ret;
> @@ -1294,7 +1296,7 @@ atomic64_fetch_add(s64 i, atomic64_t *v)
>  #else /* atomic64_sub_return_relaxed */
>  
>  #ifndef atomic64_sub_return_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_sub_return_acquire(s64 i, atomic64_t *v)
>  {
>  	s64 ret = atomic64_sub_return_relaxed(i, v);
> @@ -1305,7 +1307,7 @@ atomic64_sub_return_acquire(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_sub_return_release
> -static inline s64
> +static __always_inline s64
>  atomic64_sub_return_release(s64 i, atomic64_t *v)
>  {
>  	__atomic_release_fence();
> @@ -1315,7 +1317,7 @@ atomic64_sub_return_release(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_sub_return
> -static inline s64
> +static __always_inline s64
>  atomic64_sub_return(s64 i, atomic64_t *v)
>  {
>  	s64 ret;
> @@ -1336,7 +1338,7 @@ atomic64_sub_return(s64 i, atomic64_t *v)
>  #else /* atomic64_fetch_sub_relaxed */
>  
>  #ifndef atomic64_fetch_sub_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_sub_acquire(s64 i, atomic64_t *v)
>  {
>  	s64 ret = atomic64_fetch_sub_relaxed(i, v);
> @@ -1347,7 +1349,7 @@ atomic64_fetch_sub_acquire(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_sub_release
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_sub_release(s64 i, atomic64_t *v)
>  {
>  	__atomic_release_fence();
> @@ -1357,7 +1359,7 @@ atomic64_fetch_sub_release(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_sub
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_sub(s64 i, atomic64_t *v)
>  {
>  	s64 ret;
> @@ -1372,7 +1374,7 @@ atomic64_fetch_sub(s64 i, atomic64_t *v)
>  #endif /* atomic64_fetch_sub_relaxed */
>  
>  #ifndef atomic64_inc
> -static inline void
> +static __always_inline void
>  atomic64_inc(atomic64_t *v)
>  {
>  	atomic64_add(1, v);
> @@ -1388,7 +1390,7 @@ atomic64_inc(atomic64_t *v)
>  #endif /* atomic64_inc_return */
>  
>  #ifndef atomic64_inc_return
> -static inline s64
> +static __always_inline s64
>  atomic64_inc_return(atomic64_t *v)
>  {
>  	return atomic64_add_return(1, v);
> @@ -1397,7 +1399,7 @@ atomic64_inc_return(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_inc_return_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_inc_return_acquire(atomic64_t *v)
>  {
>  	return atomic64_add_return_acquire(1, v);
> @@ -1406,7 +1408,7 @@ atomic64_inc_return_acquire(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_inc_return_release
> -static inline s64
> +static __always_inline s64
>  atomic64_inc_return_release(atomic64_t *v)
>  {
>  	return atomic64_add_return_release(1, v);
> @@ -1415,7 +1417,7 @@ atomic64_inc_return_release(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_inc_return_relaxed
> -static inline s64
> +static __always_inline s64
>  atomic64_inc_return_relaxed(atomic64_t *v)
>  {
>  	return atomic64_add_return_relaxed(1, v);
> @@ -1426,7 +1428,7 @@ atomic64_inc_return_relaxed(atomic64_t *v)
>  #else /* atomic64_inc_return_relaxed */
>  
>  #ifndef atomic64_inc_return_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_inc_return_acquire(atomic64_t *v)
>  {
>  	s64 ret = atomic64_inc_return_relaxed(v);
> @@ -1437,7 +1439,7 @@ atomic64_inc_return_acquire(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_inc_return_release
> -static inline s64
> +static __always_inline s64
>  atomic64_inc_return_release(atomic64_t *v)
>  {
>  	__atomic_release_fence();
> @@ -1447,7 +1449,7 @@ atomic64_inc_return_release(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_inc_return
> -static inline s64
> +static __always_inline s64
>  atomic64_inc_return(atomic64_t *v)
>  {
>  	s64 ret;
> @@ -1469,7 +1471,7 @@ atomic64_inc_return(atomic64_t *v)
>  #endif /* atomic64_fetch_inc */
>  
>  #ifndef atomic64_fetch_inc
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_inc(atomic64_t *v)
>  {
>  	return atomic64_fetch_add(1, v);
> @@ -1478,7 +1480,7 @@ atomic64_fetch_inc(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_inc_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_inc_acquire(atomic64_t *v)
>  {
>  	return atomic64_fetch_add_acquire(1, v);
> @@ -1487,7 +1489,7 @@ atomic64_fetch_inc_acquire(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_inc_release
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_inc_release(atomic64_t *v)
>  {
>  	return atomic64_fetch_add_release(1, v);
> @@ -1496,7 +1498,7 @@ atomic64_fetch_inc_release(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_inc_relaxed
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_inc_relaxed(atomic64_t *v)
>  {
>  	return atomic64_fetch_add_relaxed(1, v);
> @@ -1507,7 +1509,7 @@ atomic64_fetch_inc_relaxed(atomic64_t *v)
>  #else /* atomic64_fetch_inc_relaxed */
>  
>  #ifndef atomic64_fetch_inc_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_inc_acquire(atomic64_t *v)
>  {
>  	s64 ret = atomic64_fetch_inc_relaxed(v);
> @@ -1518,7 +1520,7 @@ atomic64_fetch_inc_acquire(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_inc_release
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_inc_release(atomic64_t *v)
>  {
>  	__atomic_release_fence();
> @@ -1528,7 +1530,7 @@ atomic64_fetch_inc_release(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_inc
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_inc(atomic64_t *v)
>  {
>  	s64 ret;
> @@ -1543,7 +1545,7 @@ atomic64_fetch_inc(atomic64_t *v)
>  #endif /* atomic64_fetch_inc_relaxed */
>  
>  #ifndef atomic64_dec
> -static inline void
> +static __always_inline void
>  atomic64_dec(atomic64_t *v)
>  {
>  	atomic64_sub(1, v);
> @@ -1559,7 +1561,7 @@ atomic64_dec(atomic64_t *v)
>  #endif /* atomic64_dec_return */
>  
>  #ifndef atomic64_dec_return
> -static inline s64
> +static __always_inline s64
>  atomic64_dec_return(atomic64_t *v)
>  {
>  	return atomic64_sub_return(1, v);
> @@ -1568,7 +1570,7 @@ atomic64_dec_return(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_dec_return_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_dec_return_acquire(atomic64_t *v)
>  {
>  	return atomic64_sub_return_acquire(1, v);
> @@ -1577,7 +1579,7 @@ atomic64_dec_return_acquire(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_dec_return_release
> -static inline s64
> +static __always_inline s64
>  atomic64_dec_return_release(atomic64_t *v)
>  {
>  	return atomic64_sub_return_release(1, v);
> @@ -1586,7 +1588,7 @@ atomic64_dec_return_release(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_dec_return_relaxed
> -static inline s64
> +static __always_inline s64
>  atomic64_dec_return_relaxed(atomic64_t *v)
>  {
>  	return atomic64_sub_return_relaxed(1, v);
> @@ -1597,7 +1599,7 @@ atomic64_dec_return_relaxed(atomic64_t *v)
>  #else /* atomic64_dec_return_relaxed */
>  
>  #ifndef atomic64_dec_return_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_dec_return_acquire(atomic64_t *v)
>  {
>  	s64 ret = atomic64_dec_return_relaxed(v);
> @@ -1608,7 +1610,7 @@ atomic64_dec_return_acquire(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_dec_return_release
> -static inline s64
> +static __always_inline s64
>  atomic64_dec_return_release(atomic64_t *v)
>  {
>  	__atomic_release_fence();
> @@ -1618,7 +1620,7 @@ atomic64_dec_return_release(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_dec_return
> -static inline s64
> +static __always_inline s64
>  atomic64_dec_return(atomic64_t *v)
>  {
>  	s64 ret;
> @@ -1640,7 +1642,7 @@ atomic64_dec_return(atomic64_t *v)
>  #endif /* atomic64_fetch_dec */
>  
>  #ifndef atomic64_fetch_dec
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_dec(atomic64_t *v)
>  {
>  	return atomic64_fetch_sub(1, v);
> @@ -1649,7 +1651,7 @@ atomic64_fetch_dec(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_dec_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_dec_acquire(atomic64_t *v)
>  {
>  	return atomic64_fetch_sub_acquire(1, v);
> @@ -1658,7 +1660,7 @@ atomic64_fetch_dec_acquire(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_dec_release
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_dec_release(atomic64_t *v)
>  {
>  	return atomic64_fetch_sub_release(1, v);
> @@ -1667,7 +1669,7 @@ atomic64_fetch_dec_release(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_dec_relaxed
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_dec_relaxed(atomic64_t *v)
>  {
>  	return atomic64_fetch_sub_relaxed(1, v);
> @@ -1678,7 +1680,7 @@ atomic64_fetch_dec_relaxed(atomic64_t *v)
>  #else /* atomic64_fetch_dec_relaxed */
>  
>  #ifndef atomic64_fetch_dec_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_dec_acquire(atomic64_t *v)
>  {
>  	s64 ret = atomic64_fetch_dec_relaxed(v);
> @@ -1689,7 +1691,7 @@ atomic64_fetch_dec_acquire(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_dec_release
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_dec_release(atomic64_t *v)
>  {
>  	__atomic_release_fence();
> @@ -1699,7 +1701,7 @@ atomic64_fetch_dec_release(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_dec
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_dec(atomic64_t *v)
>  {
>  	s64 ret;
> @@ -1720,7 +1722,7 @@ atomic64_fetch_dec(atomic64_t *v)
>  #else /* atomic64_fetch_and_relaxed */
>  
>  #ifndef atomic64_fetch_and_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_and_acquire(s64 i, atomic64_t *v)
>  {
>  	s64 ret = atomic64_fetch_and_relaxed(i, v);
> @@ -1731,7 +1733,7 @@ atomic64_fetch_and_acquire(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_and_release
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_and_release(s64 i, atomic64_t *v)
>  {
>  	__atomic_release_fence();
> @@ -1741,7 +1743,7 @@ atomic64_fetch_and_release(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_and
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_and(s64 i, atomic64_t *v)
>  {
>  	s64 ret;
> @@ -1756,7 +1758,7 @@ atomic64_fetch_and(s64 i, atomic64_t *v)
>  #endif /* atomic64_fetch_and_relaxed */
>  
>  #ifndef atomic64_andnot
> -static inline void
> +static __always_inline void
>  atomic64_andnot(s64 i, atomic64_t *v)
>  {
>  	atomic64_and(~i, v);
> @@ -1772,7 +1774,7 @@ atomic64_andnot(s64 i, atomic64_t *v)
>  #endif /* atomic64_fetch_andnot */
>  
>  #ifndef atomic64_fetch_andnot
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_andnot(s64 i, atomic64_t *v)
>  {
>  	return atomic64_fetch_and(~i, v);
> @@ -1781,7 +1783,7 @@ atomic64_fetch_andnot(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_andnot_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
>  {
>  	return atomic64_fetch_and_acquire(~i, v);
> @@ -1790,7 +1792,7 @@ atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_andnot_release
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
>  {
>  	return atomic64_fetch_and_release(~i, v);
> @@ -1799,7 +1801,7 @@ atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_andnot_relaxed
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_andnot_relaxed(s64 i, atomic64_t *v)
>  {
>  	return atomic64_fetch_and_relaxed(~i, v);
> @@ -1810,7 +1812,7 @@ atomic64_fetch_andnot_relaxed(s64 i, atomic64_t *v)
>  #else /* atomic64_fetch_andnot_relaxed */
>  
>  #ifndef atomic64_fetch_andnot_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
>  {
>  	s64 ret = atomic64_fetch_andnot_relaxed(i, v);
> @@ -1821,7 +1823,7 @@ atomic64_fetch_andnot_acquire(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_andnot_release
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
>  {
>  	__atomic_release_fence();
> @@ -1831,7 +1833,7 @@ atomic64_fetch_andnot_release(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_andnot
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_andnot(s64 i, atomic64_t *v)
>  {
>  	s64 ret;
> @@ -1852,7 +1854,7 @@ atomic64_fetch_andnot(s64 i, atomic64_t *v)
>  #else /* atomic64_fetch_or_relaxed */
>  
>  #ifndef atomic64_fetch_or_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_or_acquire(s64 i, atomic64_t *v)
>  {
>  	s64 ret = atomic64_fetch_or_relaxed(i, v);
> @@ -1863,7 +1865,7 @@ atomic64_fetch_or_acquire(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_or_release
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_or_release(s64 i, atomic64_t *v)
>  {
>  	__atomic_release_fence();
> @@ -1873,7 +1875,7 @@ atomic64_fetch_or_release(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_or
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_or(s64 i, atomic64_t *v)
>  {
>  	s64 ret;
> @@ -1894,7 +1896,7 @@ atomic64_fetch_or(s64 i, atomic64_t *v)
>  #else /* atomic64_fetch_xor_relaxed */
>  
>  #ifndef atomic64_fetch_xor_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_xor_acquire(s64 i, atomic64_t *v)
>  {
>  	s64 ret = atomic64_fetch_xor_relaxed(i, v);
> @@ -1905,7 +1907,7 @@ atomic64_fetch_xor_acquire(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_xor_release
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_xor_release(s64 i, atomic64_t *v)
>  {
>  	__atomic_release_fence();
> @@ -1915,7 +1917,7 @@ atomic64_fetch_xor_release(s64 i, atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_fetch_xor
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_xor(s64 i, atomic64_t *v)
>  {
>  	s64 ret;
> @@ -1936,7 +1938,7 @@ atomic64_fetch_xor(s64 i, atomic64_t *v)
>  #else /* atomic64_xchg_relaxed */
>  
>  #ifndef atomic64_xchg_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_xchg_acquire(atomic64_t *v, s64 i)
>  {
>  	s64 ret = atomic64_xchg_relaxed(v, i);
> @@ -1947,7 +1949,7 @@ atomic64_xchg_acquire(atomic64_t *v, s64 i)
>  #endif
>  
>  #ifndef atomic64_xchg_release
> -static inline s64
> +static __always_inline s64
>  atomic64_xchg_release(atomic64_t *v, s64 i)
>  {
>  	__atomic_release_fence();
> @@ -1957,7 +1959,7 @@ atomic64_xchg_release(atomic64_t *v, s64 i)
>  #endif
>  
>  #ifndef atomic64_xchg
> -static inline s64
> +static __always_inline s64
>  atomic64_xchg(atomic64_t *v, s64 i)
>  {
>  	s64 ret;
> @@ -1978,7 +1980,7 @@ atomic64_xchg(atomic64_t *v, s64 i)
>  #else /* atomic64_cmpxchg_relaxed */
>  
>  #ifndef atomic64_cmpxchg_acquire
> -static inline s64
> +static __always_inline s64
>  atomic64_cmpxchg_acquire(atomic64_t *v, s64 old, s64 new)
>  {
>  	s64 ret = atomic64_cmpxchg_relaxed(v, old, new);
> @@ -1989,7 +1991,7 @@ atomic64_cmpxchg_acquire(atomic64_t *v, s64 old, s64 new)
>  #endif
>  
>  #ifndef atomic64_cmpxchg_release
> -static inline s64
> +static __always_inline s64
>  atomic64_cmpxchg_release(atomic64_t *v, s64 old, s64 new)
>  {
>  	__atomic_release_fence();
> @@ -1999,7 +2001,7 @@ atomic64_cmpxchg_release(atomic64_t *v, s64 old, s64 new)
>  #endif
>  
>  #ifndef atomic64_cmpxchg
> -static inline s64
> +static __always_inline s64
>  atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
>  {
>  	s64 ret;
> @@ -2021,7 +2023,7 @@ atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
>  #endif /* atomic64_try_cmpxchg */
>  
>  #ifndef atomic64_try_cmpxchg
> -static inline bool
> +static __always_inline bool
>  atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
>  {
>  	s64 r, o = *old;
> @@ -2034,7 +2036,7 @@ atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
>  #endif
>  
>  #ifndef atomic64_try_cmpxchg_acquire
> -static inline bool
> +static __always_inline bool
>  atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
>  {
>  	s64 r, o = *old;
> @@ -2047,7 +2049,7 @@ atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
>  #endif
>  
>  #ifndef atomic64_try_cmpxchg_release
> -static inline bool
> +static __always_inline bool
>  atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
>  {
>  	s64 r, o = *old;
> @@ -2060,7 +2062,7 @@ atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
>  #endif
>  
>  #ifndef atomic64_try_cmpxchg_relaxed
> -static inline bool
> +static __always_inline bool
>  atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
>  {
>  	s64 r, o = *old;
> @@ -2075,7 +2077,7 @@ atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
>  #else /* atomic64_try_cmpxchg_relaxed */
>  
>  #ifndef atomic64_try_cmpxchg_acquire
> -static inline bool
> +static __always_inline bool
>  atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
>  {
>  	bool ret = atomic64_try_cmpxchg_relaxed(v, old, new);
> @@ -2086,7 +2088,7 @@ atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
>  #endif
>  
>  #ifndef atomic64_try_cmpxchg_release
> -static inline bool
> +static __always_inline bool
>  atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
>  {
>  	__atomic_release_fence();
> @@ -2096,7 +2098,7 @@ atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
>  #endif
>  
>  #ifndef atomic64_try_cmpxchg
> -static inline bool
> +static __always_inline bool
>  atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
>  {
>  	bool ret;
> @@ -2120,7 +2122,7 @@ atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
>   * true if the result is zero, or false for all
>   * other cases.
>   */
> -static inline bool
> +static __always_inline bool
>  atomic64_sub_and_test(s64 i, atomic64_t *v)
>  {
>  	return atomic64_sub_return(i, v) == 0;
> @@ -2137,7 +2139,7 @@ atomic64_sub_and_test(s64 i, atomic64_t *v)
>   * returns true if the result is 0, or false for all other
>   * cases.
>   */
> -static inline bool
> +static __always_inline bool
>  atomic64_dec_and_test(atomic64_t *v)
>  {
>  	return atomic64_dec_return(v) == 0;
> @@ -2154,7 +2156,7 @@ atomic64_dec_and_test(atomic64_t *v)
>   * and returns true if the result is zero, or false for all
>   * other cases.
>   */
> -static inline bool
> +static __always_inline bool
>  atomic64_inc_and_test(atomic64_t *v)
>  {
>  	return atomic64_inc_return(v) == 0;
> @@ -2172,7 +2174,7 @@ atomic64_inc_and_test(atomic64_t *v)
>   * if the result is negative, or false when
>   * result is greater than or equal to zero.
>   */
> -static inline bool
> +static __always_inline bool
>  atomic64_add_negative(s64 i, atomic64_t *v)
>  {
>  	return atomic64_add_return(i, v) < 0;
> @@ -2190,7 +2192,7 @@ atomic64_add_negative(s64 i, atomic64_t *v)
>   * Atomically adds @a to @v, so long as @v was not already @u.
>   * Returns original value of @v
>   */
> -static inline s64
> +static __always_inline s64
>  atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
>  {
>  	s64 c = atomic64_read(v);
> @@ -2215,7 +2217,7 @@ atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
>   * Atomically adds @a to @v, if @v was not already @u.
>   * Returns true if the addition was done.
>   */
> -static inline bool
> +static __always_inline bool
>  atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
>  {
>  	return atomic64_fetch_add_unless(v, a, u) != u;
> @@ -2231,7 +2233,7 @@ atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
>   * Atomically increments @v by 1, if @v is non-zero.
>   * Returns true if the increment was done.
>   */
> -static inline bool
> +static __always_inline bool
>  atomic64_inc_not_zero(atomic64_t *v)
>  {
>  	return atomic64_add_unless(v, 1, 0);
> @@ -2240,7 +2242,7 @@ atomic64_inc_not_zero(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_inc_unless_negative
> -static inline bool
> +static __always_inline bool
>  atomic64_inc_unless_negative(atomic64_t *v)
>  {
>  	s64 c = atomic64_read(v);
> @@ -2256,7 +2258,7 @@ atomic64_inc_unless_negative(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_dec_unless_positive
> -static inline bool
> +static __always_inline bool
>  atomic64_dec_unless_positive(atomic64_t *v)
>  {
>  	s64 c = atomic64_read(v);
> @@ -2272,7 +2274,7 @@ atomic64_dec_unless_positive(atomic64_t *v)
>  #endif
>  
>  #ifndef atomic64_dec_if_positive
> -static inline s64
> +static __always_inline s64
>  atomic64_dec_if_positive(atomic64_t *v)
>  {
>  	s64 dec, c = atomic64_read(v);
> @@ -2292,4 +2294,4 @@ atomic64_dec_if_positive(atomic64_t *v)
>  #define atomic64_cond_read_relaxed(v, c) smp_cond_load_relaxed(&(v)->counter, (c))
>  
>  #endif /* _LINUX_ATOMIC_FALLBACK_H */
> -// 25de4a2804d70f57e994fe3b419148658bb5378a
> +// baaf45f4c24ed88ceae58baca39d7fd80bb8101b
> diff --git a/scripts/atomic/fallbacks/acquire b/scripts/atomic/fallbacks/acquire
> index e38871e64db6..ea489acc285e 100755
> --- a/scripts/atomic/fallbacks/acquire
> +++ b/scripts/atomic/fallbacks/acquire
> @@ -1,5 +1,5 @@
>  cat <<EOF
> -static inline ${ret}
> +static __always_inline ${ret}
>  ${atomic}_${pfx}${name}${sfx}_acquire(${params})
>  {
>  	${ret} ret = ${atomic}_${pfx}${name}${sfx}_relaxed(${args});
> diff --git a/scripts/atomic/fallbacks/add_negative b/scripts/atomic/fallbacks/add_negative
> index e6f4815637de..03cc2e07fac5 100755
> --- a/scripts/atomic/fallbacks/add_negative
> +++ b/scripts/atomic/fallbacks/add_negative
> @@ -8,7 +8,7 @@ cat <<EOF
>   * if the result is negative, or false when
>   * result is greater than or equal to zero.
>   */
> -static inline bool
> +static __always_inline bool
>  ${atomic}_add_negative(${int} i, ${atomic}_t *v)
>  {
>  	return ${atomic}_add_return(i, v) < 0;
> diff --git a/scripts/atomic/fallbacks/add_unless b/scripts/atomic/fallbacks/add_unless
> index 792533885fbf..daf87a04c850 100755
> --- a/scripts/atomic/fallbacks/add_unless
> +++ b/scripts/atomic/fallbacks/add_unless
> @@ -8,7 +8,7 @@ cat << EOF
>   * Atomically adds @a to @v, if @v was not already @u.
>   * Returns true if the addition was done.
>   */
> -static inline bool
> +static __always_inline bool
>  ${atomic}_add_unless(${atomic}_t *v, ${int} a, ${int} u)
>  {
>  	return ${atomic}_fetch_add_unless(v, a, u) != u;
> diff --git a/scripts/atomic/fallbacks/andnot b/scripts/atomic/fallbacks/andnot
> index 9f3a3216b5e3..14efce01225a 100755
> --- a/scripts/atomic/fallbacks/andnot
> +++ b/scripts/atomic/fallbacks/andnot
> @@ -1,5 +1,5 @@
>  cat <<EOF
> -static inline ${ret}
> +static __always_inline ${ret}
>  ${atomic}_${pfx}andnot${sfx}${order}(${int} i, ${atomic}_t *v)
>  {
>  	${retstmt}${atomic}_${pfx}and${sfx}${order}(~i, v);
> diff --git a/scripts/atomic/fallbacks/dec b/scripts/atomic/fallbacks/dec
> index 10bbc82be31d..118282f3a5a3 100755
> --- a/scripts/atomic/fallbacks/dec
> +++ b/scripts/atomic/fallbacks/dec
> @@ -1,5 +1,5 @@
>  cat <<EOF
> -static inline ${ret}
> +static __always_inline ${ret}
>  ${atomic}_${pfx}dec${sfx}${order}(${atomic}_t *v)
>  {
>  	${retstmt}${atomic}_${pfx}sub${sfx}${order}(1, v);
> diff --git a/scripts/atomic/fallbacks/dec_and_test b/scripts/atomic/fallbacks/dec_and_test
> index 0ce7103b3df2..f8967a891117 100755
> --- a/scripts/atomic/fallbacks/dec_and_test
> +++ b/scripts/atomic/fallbacks/dec_and_test
> @@ -7,7 +7,7 @@ cat <<EOF
>   * returns true if the result is 0, or false for all other
>   * cases.
>   */
> -static inline bool
> +static __always_inline bool
>  ${atomic}_dec_and_test(${atomic}_t *v)
>  {
>  	return ${atomic}_dec_return(v) == 0;
> diff --git a/scripts/atomic/fallbacks/dec_if_positive b/scripts/atomic/fallbacks/dec_if_positive
> index c52eacec43c8..cfb380bd2da6 100755
> --- a/scripts/atomic/fallbacks/dec_if_positive
> +++ b/scripts/atomic/fallbacks/dec_if_positive
> @@ -1,5 +1,5 @@
>  cat <<EOF
> -static inline ${ret}
> +static __always_inline ${ret}
>  ${atomic}_dec_if_positive(${atomic}_t *v)
>  {
>  	${int} dec, c = ${atomic}_read(v);
> diff --git a/scripts/atomic/fallbacks/dec_unless_positive b/scripts/atomic/fallbacks/dec_unless_positive
> index 8a2578f14268..69cb7aa01f9c 100755
> --- a/scripts/atomic/fallbacks/dec_unless_positive
> +++ b/scripts/atomic/fallbacks/dec_unless_positive
> @@ -1,5 +1,5 @@
>  cat <<EOF
> -static inline bool
> +static __always_inline bool
>  ${atomic}_dec_unless_positive(${atomic}_t *v)
>  {
>  	${int} c = ${atomic}_read(v);
> diff --git a/scripts/atomic/fallbacks/fence b/scripts/atomic/fallbacks/fence
> index 82f68fa6931a..92a3a4691bab 100755
> --- a/scripts/atomic/fallbacks/fence
> +++ b/scripts/atomic/fallbacks/fence
> @@ -1,5 +1,5 @@
>  cat <<EOF
> -static inline ${ret}
> +static __always_inline ${ret}
>  ${atomic}_${pfx}${name}${sfx}(${params})
>  {
>  	${ret} ret;
> diff --git a/scripts/atomic/fallbacks/fetch_add_unless b/scripts/atomic/fallbacks/fetch_add_unless
> index d2c091db7eae..fffbc0d16fdf 100755
> --- a/scripts/atomic/fallbacks/fetch_add_unless
> +++ b/scripts/atomic/fallbacks/fetch_add_unless
> @@ -8,7 +8,7 @@ cat << EOF
>   * Atomically adds @a to @v, so long as @v was not already @u.
>   * Returns original value of @v
>   */
> -static inline ${int}
> +static __always_inline ${int}
>  ${atomic}_fetch_add_unless(${atomic}_t *v, ${int} a, ${int} u)
>  {
>  	${int} c = ${atomic}_read(v);
> diff --git a/scripts/atomic/fallbacks/inc b/scripts/atomic/fallbacks/inc
> index f866b3ad2353..10751cd62829 100755
> --- a/scripts/atomic/fallbacks/inc
> +++ b/scripts/atomic/fallbacks/inc
> @@ -1,5 +1,5 @@
>  cat <<EOF
> -static inline ${ret}
> +static __always_inline ${ret}
>  ${atomic}_${pfx}inc${sfx}${order}(${atomic}_t *v)
>  {
>  	${retstmt}${atomic}_${pfx}add${sfx}${order}(1, v);
> diff --git a/scripts/atomic/fallbacks/inc_and_test b/scripts/atomic/fallbacks/inc_and_test
> index 4e2068869f7e..4acea9c93604 100755
> --- a/scripts/atomic/fallbacks/inc_and_test
> +++ b/scripts/atomic/fallbacks/inc_and_test
> @@ -7,7 +7,7 @@ cat <<EOF
>   * and returns true if the result is zero, or false for all
>   * other cases.
>   */
> -static inline bool
> +static __always_inline bool
>  ${atomic}_inc_and_test(${atomic}_t *v)
>  {
>  	return ${atomic}_inc_return(v) == 0;
> diff --git a/scripts/atomic/fallbacks/inc_not_zero b/scripts/atomic/fallbacks/inc_not_zero
> index a7c45c8d107c..d9f7b97aab42 100755
> --- a/scripts/atomic/fallbacks/inc_not_zero
> +++ b/scripts/atomic/fallbacks/inc_not_zero
> @@ -6,7 +6,7 @@ cat <<EOF
>   * Atomically increments @v by 1, if @v is non-zero.
>   * Returns true if the increment was done.
>   */
> -static inline bool
> +static __always_inline bool
>  ${atomic}_inc_not_zero(${atomic}_t *v)
>  {
>  	return ${atomic}_add_unless(v, 1, 0);
> diff --git a/scripts/atomic/fallbacks/inc_unless_negative b/scripts/atomic/fallbacks/inc_unless_negative
> index 0c266e71dbd4..177a7cb51eda 100755
> --- a/scripts/atomic/fallbacks/inc_unless_negative
> +++ b/scripts/atomic/fallbacks/inc_unless_negative
> @@ -1,5 +1,5 @@
>  cat <<EOF
> -static inline bool
> +static __always_inline bool
>  ${atomic}_inc_unless_negative(${atomic}_t *v)
>  {
>  	${int} c = ${atomic}_read(v);
> diff --git a/scripts/atomic/fallbacks/read_acquire b/scripts/atomic/fallbacks/read_acquire
> index 75863b5203f7..12fa83cb3a6d 100755
> --- a/scripts/atomic/fallbacks/read_acquire
> +++ b/scripts/atomic/fallbacks/read_acquire
> @@ -1,5 +1,5 @@
>  cat <<EOF
> -static inline ${ret}
> +static __always_inline ${ret}
>  ${atomic}_read_acquire(const ${atomic}_t *v)
>  {
>  	return smp_load_acquire(&(v)->counter);
> diff --git a/scripts/atomic/fallbacks/release b/scripts/atomic/fallbacks/release
> index 3f628a3802d9..730d2a6d3e07 100755
> --- a/scripts/atomic/fallbacks/release
> +++ b/scripts/atomic/fallbacks/release
> @@ -1,5 +1,5 @@
>  cat <<EOF
> -static inline ${ret}
> +static __always_inline ${ret}
>  ${atomic}_${pfx}${name}${sfx}_release(${params})
>  {
>  	__atomic_release_fence();
> diff --git a/scripts/atomic/fallbacks/set_release b/scripts/atomic/fallbacks/set_release
> index 45bb5e0cfc08..e5d72c717434 100755
> --- a/scripts/atomic/fallbacks/set_release
> +++ b/scripts/atomic/fallbacks/set_release
> @@ -1,5 +1,5 @@
>  cat <<EOF
> -static inline void
> +static __always_inline void
>  ${atomic}_set_release(${atomic}_t *v, ${int} i)
>  {
>  	smp_store_release(&(v)->counter, i);
> diff --git a/scripts/atomic/fallbacks/sub_and_test b/scripts/atomic/fallbacks/sub_and_test
> index 289ef17a2d7a..6cfe4ed49746 100755
> --- a/scripts/atomic/fallbacks/sub_and_test
> +++ b/scripts/atomic/fallbacks/sub_and_test
> @@ -8,7 +8,7 @@ cat <<EOF
>   * true if the result is zero, or false for all
>   * other cases.
>   */
> -static inline bool
> +static __always_inline bool
>  ${atomic}_sub_and_test(${int} i, ${atomic}_t *v)
>  {
>  	return ${atomic}_sub_return(i, v) == 0;
> diff --git a/scripts/atomic/fallbacks/try_cmpxchg b/scripts/atomic/fallbacks/try_cmpxchg
> index 4ed85e2f5378..c7a26213b978 100755
> --- a/scripts/atomic/fallbacks/try_cmpxchg
> +++ b/scripts/atomic/fallbacks/try_cmpxchg
> @@ -1,5 +1,5 @@
>  cat <<EOF
> -static inline bool
> +static __always_inline bool
>  ${atomic}_try_cmpxchg${order}(${atomic}_t *v, ${int} *old, ${int} new)
>  {
>  	${int} r, o = *old;
> diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
> index 1bd7c1707633..b6c6f5d306a7 100755
> --- a/scripts/atomic/gen-atomic-fallback.sh
> +++ b/scripts/atomic/gen-atomic-fallback.sh
> @@ -149,6 +149,8 @@ cat << EOF
>  #ifndef _LINUX_ATOMIC_FALLBACK_H
>  #define _LINUX_ATOMIC_FALLBACK_H
>  
> +#include <linux/compiler.h>
> +
>  EOF
>  
>  for xchg in "xchg" "cmpxchg" "cmpxchg64"; do
> 
> 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter()
  2020-02-19 15:31   ` Steven Rostedt
@ 2020-02-19 16:56     ` Borislav Petkov
  2020-02-19 17:07       ` Peter Zijlstra
  0 siblings, 1 reply; 99+ messages in thread
From: Borislav Petkov @ 2020-02-19 16:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, paulmck, josh, mathieu.desnoyers, jiangshanlai,
	luto, tony.luck, frederic, dan.carpenter, mhiramat, Will Deacon,
	Marc Zyngier, Michael Ellerman, Petr Mladek

On Wed, Feb 19, 2020 at 10:31:26AM -0500, Steven Rostedt wrote:
> Probably should document somewhere (in a comment above nmi_enter()?)
> that we allow nmi_enter() to nest up to 15 times.

Yah, and can we make the BUG_ON() WARN_ON or so instead, so that there's
at least a chance to be able to catch it for debugging. Or is the box
going to be irreparably wedged after the 4 bits overflow?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 05/22] rcu: Make RCU IRQ enter/exit functions rely on in_nmi()
  2020-02-19 16:37     ` Peter Zijlstra
  2020-02-19 16:45       ` Paul E. McKenney
@ 2020-02-19 17:03       ` Peter Zijlstra
  2020-02-19 17:42         ` Paul E. McKenney
  1 sibling, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 17:03 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 05:37:00PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 19, 2020 at 08:31:56AM -0800, Paul E. McKenney wrote:

> > Here is the latest version of that comment, posted by Steve Rostedt.
> > 
> > 							Thanx, Paul
> > 
> > /*
> >  * All functions called in the breakpoint trap handler (e.g. do_int3()
> >  * on x86), must not allow kprobes until the kprobe breakpoint handler
> >  * is called, otherwise it can cause an infinite recursion.
> >  * On some archs, rcu_nmi_enter() is called in the breakpoint handler
> >  * before the kprobe breakpoint handler is called, thus it must be
> >  * marked as NOKPROBE.
> >  */
> 
> Oh right, let me stick that in a separate patch. Best we not loose that
> I suppose ;-)

Having gone over the old thread, I ended up with the below. Anyone
holler if I got it wrong somehow.

---
Subject: rcu: Provide comment for NOKPROBE() on rcu_nmi_enter()
From: Steven Rostedt <rostedt@goodmis.org>

From: Steven Rostedt <rostedt@goodmis.org>

The rcu_nmi_enter() function was marked NOKPROBE() by commit
c13324a505c77 ("x86/kprobes: Prohibit probing on functions before
kprobe_int3_handler()") because the do_int3() call kprobe code must
not be invoked before kprobe_int3_handler() is called.  It turns out
that ist_enter() (in do_int3()) calls rcu_nmi_enter(), hence the
marking NOKPROBE() being added to rcu_nmi_enter().

This commit therefore adds a comment documenting this line of
reasoning.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/rcu/tree.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -842,6 +842,14 @@ void rcu_nmi_enter(void)
 {
 	rcu_nmi_enter_common(false);
 }
+/*
+ * All functions called in the breakpoint trap handler (e.g. do_int3()
+ * on x86), must not allow kprobes until the kprobe breakpoint handler
+ * is called, otherwise it can cause an infinite recursion.
+ * On some archs, rcu_nmi_enter() is called in the breakpoint handler
+ * before the kprobe breakpoint handler is called, thus it must be
+ * marked as NOKPROBE.
+ */
 NOKPROBE_SYMBOL(rcu_nmi_enter);
 
 /**

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again)
  2020-02-19 16:47     ` Peter Zijlstra
@ 2020-02-19 17:05       ` Peter Zijlstra
  2020-02-19 17:21         ` Steven Rostedt
  0 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 17:05 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 05:47:36PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 19, 2020 at 08:43:56AM -0800, Paul E. McKenney wrote:
> > On Wed, Feb 19, 2020 at 03:47:37PM +0100, Peter Zijlstra wrote:
> > > Effectively revert commit 865e63b04e9b2 ("tracing: Add back in
> > > rcu_irq_enter/exit_irqson() for rcuidle tracepoints") now that we've
> > > taught perf how to deal with not having an RCU context provided.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > >  include/linux/tracepoint.h |    8 ++------
> > >  1 file changed, 2 insertions(+), 6 deletions(-)
> > > 
> > > --- a/include/linux/tracepoint.h
> > > +++ b/include/linux/tracepoint.h
> > > @@ -179,10 +179,8 @@ static inline struct tracepoint *tracepo
> > 
> > Shouldn't we also get rid of this line above?
> > 
> > 		int __maybe_unused __idx = 0;				\
> > 
> 
> Probably makes a lot of sense, lemme fix that!

Oh wait, no! SRCU is the one that remains !

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE
  2020-02-19 16:27             ` Paul E. McKenney
  2020-02-19 16:34               ` Peter Zijlstra
@ 2020-02-19 17:05               ` Steven Rostedt
  1 sibling, 0 replies; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 17:05 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Wed, 19 Feb 2020 08:27:47 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> > Or, we could just cut and paste the current memmove and make a notrace
> > version too. Then we don't need to worry bout bugs like this.  
> 
> OK, I will bite...
> 
> Can we just make the core be an inline function and make a notrace and
> a trace caller?  Possibly going one step further and having one call
> the other?  (Presumably the traceable version invoking the notrace
> version, but it has been one good long time since I have looked at
> function preambles.)

Sure. Looking at the implementation (which is big and ugly), we could
have a

static always_inline void __memmove(...)
{
	[..]
}

__visible void *memmove(...)
{
	return __memmove(...);
}

__visible notrace void *memmove_notrace(...)
{
	return __memmove(...);
}

-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 18/22] asm-generic/atomic: Use __always_inline for fallback wrappers
  2020-02-19 16:55   ` Paul E. McKenney
@ 2020-02-19 17:06     ` Peter Zijlstra
  2020-02-19 17:35       ` Paul E. McKenney
  0 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 17:06 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat, Mark Rutland, Marco Elver

On Wed, Feb 19, 2020 at 08:55:21AM -0800, Paul E. McKenney wrote:
> On Wed, Feb 19, 2020 at 03:47:42PM +0100, Peter Zijlstra wrote:
> > While the fallback wrappers aren't pure wrappers, they are trivial
> > nonetheless, and the function they wrap should determine the final
> > inlining policy.
> > 
> > For x86 tinyconfig we observe:
> >  - vmlinux baseline: 1315988
> >  - vmlinux with patch: 1315928 (-60 bytes)
> > 
> > Suggested-by: Mark Rutland <mark.rutland@arm.com>
> > Signed-off-by: Marco Elver <elver@google.com>
> > Acked-by: Mark Rutland <mark.rutland@arm.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> And this one and the previous one are also already in -tip, FYI.

That's where I found them ;-) Stole them from tip/locking/kcsan.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter()
  2020-02-19 16:56     ` Borislav Petkov
@ 2020-02-19 17:07       ` Peter Zijlstra
  0 siblings, 0 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 17:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Steven Rostedt, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, paulmck, josh, mathieu.desnoyers, jiangshanlai,
	luto, tony.luck, frederic, dan.carpenter, mhiramat, Will Deacon,
	Marc Zyngier, Michael Ellerman, Petr Mladek

On Wed, Feb 19, 2020 at 05:56:50PM +0100, Borislav Petkov wrote:
> On Wed, Feb 19, 2020 at 10:31:26AM -0500, Steven Rostedt wrote:
> > Probably should document somewhere (in a comment above nmi_enter()?)
> > that we allow nmi_enter() to nest up to 15 times.
> 
> Yah, and can we make the BUG_ON() WARN_ON or so instead, so that there's
> at least a chance to be able to catch it for debugging. Or is the box
> going to be irreparably wedged after the 4 bits overflow?

It's going to be fairly buggered, because at that point in_nmi() is
going to be false again. It might survive for a little, it might not.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
  2020-02-19 14:47 ` [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic() Peter Zijlstra
@ 2020-02-19 17:13   ` Borislav Petkov
  2020-02-19 17:21     ` Andy Lutomirski
  0 siblings, 1 reply; 99+ messages in thread
From: Borislav Petkov @ 2020-02-19 17:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, paulmck, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 03:47:26PM +0100, Peter Zijlstra wrote:
> Subject: Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()

x86/mce: ...

> It is an abomination; and in prepration of removing the whole
> ist_enter() thing, it needs to go.
> 
> Convert #MC over to using task_work_add() instead; it will run the
> same code slightly later, on the return to user path of the same
> exception.

That's fine because the error happened in userspace.

...

> @@ -1202,6 +1186,29 @@ static void __mc_scan_banks(struct mce *
>  	*m = *final;
>  }
>  
> +static void mce_kill_me_now(struct callback_head *ch)
> +{
> +	force_sig(SIGBUS);
> +}
> +
> +static void mce_kill_me_maybe(struct callback_head *cb)

You don't even need the "mce_" prefixes - those are static functions and
in mce-land.

Change looks good otherwise.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH] rcu/kprobes: Comment why rcu_nmi_enter() is marked NOKPROBE
  2020-02-19 16:31   ` Paul E. McKenney
  2020-02-19 16:37     ` Peter Zijlstra
@ 2020-02-19 17:16     ` Steven Rostedt
  2020-02-19 17:18       ` Joel Fernandes
                         ` (2 more replies)
  1 sibling, 3 replies; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 17:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

From: Steven Rostedt (VMware) <rostedt@goodmis.org>

It's confusing that rcu_nmi_enter() is marked NOKPROBE and
rcu_nmi_exit() is not. One may think that the exit needs to be marked
for the same reason the enter is, as rcu_nmi_exit() reverts the RCU
state back to what it was before rcu_nmi_enter(). But the reason has
nothing to do with the state of RCU.

The breakpoint handler (int3 on x86) must not have any kprobe on it
until the kprobe handler is called. Otherwise, it can cause an infinite
recursion and crash the machine. It just so happens that
rcu_nmi_enter() is called by the int3 handler before the kprobe handler
can run, and therefore needs to be marked as NOKPROBE.

Comment this to remove the confusion to why rcu_nmi_enter() is marked
NOKPROBE but rcu_nmi_exit() is not.

Link: https://lore.kernel.org/r/20200213163800.5c51a5f1@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1694a6b57ad8..ada7b2b638fb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -846,6 +846,14 @@ void rcu_nmi_enter(void)
 {
 	rcu_nmi_enter_common(false);
 }
+/*
+ * All functions called in the breakpoint trap handler (e.g. do_int3()
+ * on x86), must not allow kprobes until the kprobe breakpoint handler
+ * is called, otherwise it can cause an infinite recursion.
+ * On some archs, rcu_nmi_enter() is called in the breakpoint handler
+ * before the kprobe breakpoint handler is called, thus it must be
+ * marked as NOKPROBE.
+ */
 NOKPROBE_SYMBOL(rcu_nmi_enter);
 
 /**

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] rcu/kprobes: Comment why rcu_nmi_enter() is marked NOKPROBE
  2020-02-19 17:16     ` [PATCH] rcu/kprobes: Comment why rcu_nmi_enter() is marked NOKPROBE Steven Rostedt
@ 2020-02-19 17:18       ` Joel Fernandes
  2020-02-19 17:41       ` Paul E. McKenney
  2020-02-20  5:54       ` Masami Hiramatsu
  2 siblings, 0 replies; 99+ messages in thread
From: Joel Fernandes @ 2020-02-19 17:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, Peter Zijlstra, LKML, linux-arch, Ingo Molnar,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Glexiner,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Andy Lutomirski,
	Tony Luck, Frederic Weisbecker, Dan Carpenter, Masami Hiramatsu

On Wed, Feb 19, 2020 at 12:16 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> From: Steven Rostedt (VMware) <rostedt@goodmis.org>
>
> It's confusing that rcu_nmi_enter() is marked NOKPROBE and
> rcu_nmi_exit() is not. One may think that the exit needs to be marked
> for the same reason the enter is, as rcu_nmi_exit() reverts the RCU
> state back to what it was before rcu_nmi_enter(). But the reason has
> nothing to do with the state of RCU.
>
> The breakpoint handler (int3 on x86) must not have any kprobe on it
> until the kprobe handler is called. Otherwise, it can cause an infinite
> recursion and crash the machine. It just so happens that
> rcu_nmi_enter() is called by the int3 handler before the kprobe handler
> can run, and therefore needs to be marked as NOKPROBE.
>
> Comment this to remove the confusion to why rcu_nmi_enter() is marked
> NOKPROBE but rcu_nmi_exit() is not.
>
> Link: https://lore.kernel.org/r/20200213163800.5c51a5f1@gandalf.local.home
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel


> ---
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 1694a6b57ad8..ada7b2b638fb 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -846,6 +846,14 @@ void rcu_nmi_enter(void)
>  {
>         rcu_nmi_enter_common(false);
>  }
> +/*
> + * All functions called in the breakpoint trap handler (e.g. do_int3()
> + * on x86), must not allow kprobes until the kprobe breakpoint handler
> + * is called, otherwise it can cause an infinite recursion.
> + * On some archs, rcu_nmi_enter() is called in the breakpoint handler
> + * before the kprobe breakpoint handler is called, thus it must be
> + * marked as NOKPROBE.
> + */
>  NOKPROBE_SYMBOL(rcu_nmi_enter);
>
>  /**

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 07/22] rcu: Mark rcu_dynticks_curr_cpu_in_eqs() inline
  2020-02-19 16:39   ` Paul E. McKenney
@ 2020-02-19 17:19     ` Steven Rostedt
  0 siblings, 0 replies; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 17:19 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Wed, 19 Feb 2020 08:39:34 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> There was some controversy over inline vs. notrace, leading me to
> ask whether we should use both inline and notrace here.  ;-)

"inline" implicitly suggests "notrace". The reason being is that there
were "surprises" when gcc decided not to inline various functions
marked as "inline" which caused ftrace to break. I figured, if someone
marks something as "inline" that it should not be traced regardless if
gcc decided to inline it or not.

-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 22/22] x86/int3: Ensure that poke_int3_handler() is not sanitized
  2020-02-19 16:30     ` Peter Zijlstra
  2020-02-19 16:51       ` Peter Zijlstra
@ 2020-02-19 17:20       ` Peter Zijlstra
  2020-02-20 10:37         ` Dmitry Vyukov
  1 sibling, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 17:20 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: LKML, linux-arch, Steven Rostedt, Ingo Molnar, Joel Fernandes,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Andy Lutomirski, tony.luck, Frederic Weisbecker,
	Dan Carpenter, Masami Hiramatsu, Andrey Ryabinin, kasan-dev

On Wed, Feb 19, 2020 at 05:30:25PM +0100, Peter Zijlstra wrote:

> By inlining everything in poke_int3_handler() (except bsearch :/) we can
> mark the whole function off limits to everything and call it a day. That
> simplicity has been the guiding principle so far.
> 
> Alternatively we can provide an __always_inline variant of bsearch().

This reduces the __no_sanitize usage to just the exception entry
(do_int3) and the critical function: poke_int3_handler().

Is this more acceptible?

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -979,7 +979,7 @@ static __always_inline void *text_poke_a
 	return _stext + tp->rel_addr;
 }
 
-static int notrace __no_sanitize patch_cmp(const void *key, const void *elt)
+static __always_inline int patch_cmp(const void *key, const void *elt)
 {
 	struct text_poke_loc *tp = (struct text_poke_loc *) elt;
 
@@ -989,7 +989,6 @@ static int notrace __no_sanitize patch_c
 		return 1;
 	return 0;
 }
-NOKPROBE_SYMBOL(patch_cmp);
 
 int notrace __no_sanitize poke_int3_handler(struct pt_regs *regs)
 {
@@ -1024,9 +1023,9 @@ int notrace __no_sanitize poke_int3_hand
 	 * Skip the binary search if there is a single member in the vector.
 	 */
 	if (unlikely(desc->nr_entries > 1)) {
-		tp = bsearch(ip, desc->vec, desc->nr_entries,
-			     sizeof(struct text_poke_loc),
-			     patch_cmp);
+		tp = __bsearch(ip, desc->vec, desc->nr_entries,
+			       sizeof(struct text_poke_loc),
+			       patch_cmp);
 		if (!tp)
 			goto out_put;
 	} else {
--- a/include/linux/bsearch.h
+++ b/include/linux/bsearch.h
@@ -4,7 +4,29 @@
 
 #include <linux/types.h>
 
-void *bsearch(const void *key, const void *base, size_t num, size_t size,
-	      cmp_func_t cmp);
+static __always_inline
+void *__bsearch(const void *key, const void *base, size_t num, size_t size, cmp_func_t cmp)
+{
+	const char *pivot;
+	int result;
+
+	while (num > 0) {
+		pivot = base + (num >> 1) * size;
+		result = cmp(key, pivot);
+
+		if (result == 0)
+			return (void *)pivot;
+
+		if (result > 0) {
+			base = pivot + size;
+			num--;
+		}
+		num >>= 1;
+	}
+
+	return NULL;
+}
+
+extern void *bsearch(const void *key, const void *base, size_t num, size_t size, cmp_func_t cmp);
 
 #endif /* _LINUX_BSEARCH_H */
--- a/lib/bsearch.c
+++ b/lib/bsearch.c
@@ -28,27 +28,9 @@
  * the key and elements in the array are of the same type, you can use
  * the same comparison function for both sort() and bsearch().
  */
-void __no_sanitize *bsearch(const void *key, const void *base, size_t num, size_t size,
-	      cmp_func_t cmp)
+void *bsearch(const void *key, const void *base, size_t num, size_t size, cmp_func_t cmp)
 {
-	const char *pivot;
-	int result;
-
-	while (num > 0) {
-		pivot = base + (num >> 1) * size;
-		result = cmp(key, pivot);
-
-		if (result == 0)
-			return (void *)pivot;
-
-		if (result > 0) {
-			base = pivot + size;
-			num--;
-		}
-		num >>= 1;
-	}
-
-	return NULL;
+	__bsearch(key, base, num, size, cmp);
 }
 EXPORT_SYMBOL(bsearch);
 NOKPROBE_SYMBOL(bsearch);

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again)
  2020-02-19 17:05       ` Peter Zijlstra
@ 2020-02-19 17:21         ` Steven Rostedt
  2020-02-19 17:40           ` Paul E. McKenney
  0 siblings, 1 reply; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 17:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Wed, 19 Feb 2020 18:05:07 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Feb 19, 2020 at 05:47:36PM +0100, Peter Zijlstra wrote:
> > On Wed, Feb 19, 2020 at 08:43:56AM -0800, Paul E. McKenney wrote:  
> > > On Wed, Feb 19, 2020 at 03:47:37PM +0100, Peter Zijlstra wrote:  
> > > > Effectively revert commit 865e63b04e9b2 ("tracing: Add back in
> > > > rcu_irq_enter/exit_irqson() for rcuidle tracepoints") now that we've
> > > > taught perf how to deal with not having an RCU context provided.
> > > > 
> > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > ---
> > > >  include/linux/tracepoint.h |    8 ++------
> > > >  1 file changed, 2 insertions(+), 6 deletions(-)
> > > > 
> > > > --- a/include/linux/tracepoint.h
> > > > +++ b/include/linux/tracepoint.h
> > > > @@ -179,10 +179,8 @@ static inline struct tracepoint *tracepo  
> > > 
> > > Shouldn't we also get rid of this line above?
> > > 
> > > 		int __maybe_unused __idx = 0;				\
> > >   
> > 
> > Probably makes a lot of sense, lemme fix that!  
> 
> Oh wait, no! SRCU is the one that remains !

Correct, and if rcuidle is not set, and this is a macro, the SRCU
portion is compiled out.

-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
  2020-02-19 17:13   ` Borislav Petkov
@ 2020-02-19 17:21     ` Andy Lutomirski
  2020-02-19 17:33       ` Peter Zijlstra
  2020-02-19 17:42       ` Borislav Petkov
  0 siblings, 2 replies; 99+ messages in thread
From: Andy Lutomirski @ 2020-02-19 17:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, LKML, linux-arch, Steven Rostedt, Ingo Molnar,
	Joel Fernandes, Greg KH, gustavo, Thomas Gleixner, paulmck,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan,
	Andrew Lutomirski, Tony Luck, Frederic Weisbecker, Dan Carpenter,
	Masami Hiramatsu

On Wed, Feb 19, 2020 at 9:13 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Feb 19, 2020 at 03:47:26PM +0100, Peter Zijlstra wrote:
> > Subject: Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
>
> x86/mce: ...
>
> > It is an abomination; and in prepration of removing the whole
> > ist_enter() thing, it needs to go.
> >
> > Convert #MC over to using task_work_add() instead; it will run the
> > same code slightly later, on the return to user path of the same
> > exception.
>
> That's fine because the error happened in userspace.

Unless there is a signal pending and the signal setup code is about to
hit the same failed memory.  I suppose we can just treat cases like
this as "oh well, time to kill the whole system".

But we should genuinely agree that we're okay with deferring this handling.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
  2020-02-19 17:21     ` Andy Lutomirski
@ 2020-02-19 17:33       ` Peter Zijlstra
  2020-02-19 22:12         ` Andy Lutomirski
  2020-02-19 17:42       ` Borislav Petkov
  1 sibling, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 17:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Borislav Petkov, LKML, linux-arch, Steven Rostedt, Ingo Molnar,
	Joel Fernandes, Greg KH, gustavo, Thomas Gleixner, paulmck,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Tony Luck,
	Frederic Weisbecker, Dan Carpenter, Masami Hiramatsu

On Wed, Feb 19, 2020 at 09:21:48AM -0800, Andy Lutomirski wrote:
> On Wed, Feb 19, 2020 at 9:13 AM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Wed, Feb 19, 2020 at 03:47:26PM +0100, Peter Zijlstra wrote:
> > > Subject: Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
> >
> > x86/mce: ...
> >
> > > It is an abomination; and in prepration of removing the whole
> > > ist_enter() thing, it needs to go.
> > >
> > > Convert #MC over to using task_work_add() instead; it will run the
> > > same code slightly later, on the return to user path of the same
> > > exception.
> >
> > That's fine because the error happened in userspace.
> 
> Unless there is a signal pending and the signal setup code is about to
> hit the same failed memory.  I suppose we can just treat cases like
> this as "oh well, time to kill the whole system".
> 
> But we should genuinely agree that we're okay with deferring this handling.

It doesn't delay much. The moment it does that local_irq_enable() it's
subject to preemption, just like it is on the return to user path.

Do you really want to create code that unwinds enough of nmi_enter() to
get you to a preemptible context? *shudder*

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 18/22] asm-generic/atomic: Use __always_inline for fallback wrappers
  2020-02-19 17:06     ` Peter Zijlstra
@ 2020-02-19 17:35       ` Paul E. McKenney
  0 siblings, 0 replies; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 17:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat, Mark Rutland, Marco Elver

On Wed, Feb 19, 2020 at 06:06:09PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 19, 2020 at 08:55:21AM -0800, Paul E. McKenney wrote:
> > On Wed, Feb 19, 2020 at 03:47:42PM +0100, Peter Zijlstra wrote:
> > > While the fallback wrappers aren't pure wrappers, they are trivial
> > > nonetheless, and the function they wrap should determine the final
> > > inlining policy.
> > > 
> > > For x86 tinyconfig we observe:
> > >  - vmlinux baseline: 1315988
> > >  - vmlinux with patch: 1315928 (-60 bytes)
> > > 
> > > Suggested-by: Mark Rutland <mark.rutland@arm.com>
> > > Signed-off-by: Marco Elver <elver@google.com>
> > > Acked-by: Mark Rutland <mark.rutland@arm.com>
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > 
> > And this one and the previous one are also already in -tip, FYI.
> 
> That's where I found them ;-) Stole them from tip/locking/kcsan.

As long as the heist is official, then.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 16/22] locking/atomics, kcsan: Add KCSAN instrumentation
  2020-02-19 16:54         ` Peter Zijlstra
@ 2020-02-19 17:36           ` Paul E. McKenney
  0 siblings, 0 replies; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 17:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat, Marco Elver,
	Mark Rutland

On Wed, Feb 19, 2020 at 05:54:55PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 19, 2020 at 08:50:20AM -0800, Paul E. McKenney wrote:
> > On Wed, Feb 19, 2020 at 05:03:18PM +0100, Peter Zijlstra wrote:
> > > On Wed, Feb 19, 2020 at 10:46:26AM -0500, Steven Rostedt wrote:
> > > > On Wed, 19 Feb 2020 15:47:40 +0100
> > > > Peter Zijlstra <peterz@infradead.org> wrote:
> > > > 
> > > > > From: Marco Elver <elver@google.com>
> > > > > 
> > > > > This adds KCSAN instrumentation to atomic-instrumented.h.
> > > > > 
> > > > > Signed-off-by: Marco Elver <elver@google.com>
> > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > > [peterz: removed the actual kcsan hooks]
> > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > > Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> > > > > ---
> > > > >  include/asm-generic/atomic-instrumented.h |  390 +++++++++++++++---------------
> > > > >  scripts/atomic/gen-atomic-instrumented.sh |   14 -
> > > > >  2 files changed, 212 insertions(+), 192 deletions(-)
> > > > > 
> > > > 
> > > > 
> > > > Does this and the rest of the series depend on the previous patches in
> > > > the series? Or can this be a series on to itself (patches 16-22)?
> > > 
> > > It can probably stand on its own, but it very much is related in so far
> > > that it's fallout from staring at all this nonsense.
> > > 
> > > Without these the do_int3() can actually have accidental tracing before
> > > reaching it's nmi_enter().
> > 
> > The original is already in -tip, so some merge magic will be required.
> 
> Yes, So I don't strictly need this one, but I do need the two next
> patches adding __always_inline to everything. I figured it was easier to
> also pick this one (and butcher it) than to rebase everything.
> 
> I didn't want to depend on the locking/kcsan tree, and if this goes in,
> we do have to do something 'funny' there. Maybe rebase, maybe put in a
> few kcsan stubs so the original patch at least compiles. We'll see :/

Fair enough!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again)
  2020-02-19 17:21         ` Steven Rostedt
@ 2020-02-19 17:40           ` Paul E. McKenney
  2020-02-19 18:00             ` Steven Rostedt
  0 siblings, 1 reply; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 17:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 12:21:16PM -0500, Steven Rostedt wrote:
> On Wed, 19 Feb 2020 18:05:07 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, Feb 19, 2020 at 05:47:36PM +0100, Peter Zijlstra wrote:
> > > On Wed, Feb 19, 2020 at 08:43:56AM -0800, Paul E. McKenney wrote:  
> > > > On Wed, Feb 19, 2020 at 03:47:37PM +0100, Peter Zijlstra wrote:  
> > > > > Effectively revert commit 865e63b04e9b2 ("tracing: Add back in
> > > > > rcu_irq_enter/exit_irqson() for rcuidle tracepoints") now that we've
> > > > > taught perf how to deal with not having an RCU context provided.
> > > > > 
> > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > > ---
> > > > >  include/linux/tracepoint.h |    8 ++------
> > > > >  1 file changed, 2 insertions(+), 6 deletions(-)
> > > > > 
> > > > > --- a/include/linux/tracepoint.h
> > > > > +++ b/include/linux/tracepoint.h
> > > > > @@ -179,10 +179,8 @@ static inline struct tracepoint *tracepo  
> > > > 
> > > > Shouldn't we also get rid of this line above?
> > > > 
> > > > 		int __maybe_unused __idx = 0;				\
> > > >   
> > > 
> > > Probably makes a lot of sense, lemme fix that!  
> > 
> > Oh wait, no! SRCU is the one that remains !
> 
> Correct, and if rcuidle is not set, and this is a macro, the SRCU
> portion is compiled out.

Sigh!  Apologies for the noise!

If we are using SRCU, we don't care whether or not RCU is watching.  OK,
maybe finally catching up -- the whole point was use of RCU in other
tracing code, wasn't it?

						Thanx, Paul

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] rcu/kprobes: Comment why rcu_nmi_enter() is marked NOKPROBE
  2020-02-19 17:16     ` [PATCH] rcu/kprobes: Comment why rcu_nmi_enter() is marked NOKPROBE Steven Rostedt
  2020-02-19 17:18       ` Joel Fernandes
@ 2020-02-19 17:41       ` Paul E. McKenney
  2020-02-20  5:54       ` Masami Hiramatsu
  2 siblings, 0 replies; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 17:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 12:16:09PM -0500, Steven Rostedt wrote:
> From: Steven Rostedt (VMware) <rostedt@goodmis.org>
> 
> It's confusing that rcu_nmi_enter() is marked NOKPROBE and
> rcu_nmi_exit() is not. One may think that the exit needs to be marked
> for the same reason the enter is, as rcu_nmi_exit() reverts the RCU
> state back to what it was before rcu_nmi_enter(). But the reason has
> nothing to do with the state of RCU.
> 
> The breakpoint handler (int3 on x86) must not have any kprobe on it
> until the kprobe handler is called. Otherwise, it can cause an infinite
> recursion and crash the machine. It just so happens that
> rcu_nmi_enter() is called by the int3 handler before the kprobe handler
> can run, and therefore needs to be marked as NOKPROBE.
> 
> Comment this to remove the confusion to why rcu_nmi_enter() is marked
> NOKPROBE but rcu_nmi_exit() is not.
> 
> Link: https://lore.kernel.org/r/20200213163800.5c51a5f1@gandalf.local.home
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 1694a6b57ad8..ada7b2b638fb 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -846,6 +846,14 @@ void rcu_nmi_enter(void)
>  {
>  	rcu_nmi_enter_common(false);
>  }
> +/*
> + * All functions called in the breakpoint trap handler (e.g. do_int3()
> + * on x86), must not allow kprobes until the kprobe breakpoint handler
> + * is called, otherwise it can cause an infinite recursion.
> + * On some archs, rcu_nmi_enter() is called in the breakpoint handler
> + * before the kprobe breakpoint handler is called, thus it must be
> + * marked as NOKPROBE.
> + */
>  NOKPROBE_SYMBOL(rcu_nmi_enter);
>  
>  /**

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 05/22] rcu: Make RCU IRQ enter/exit functions rely on in_nmi()
  2020-02-19 17:03       ` Peter Zijlstra
@ 2020-02-19 17:42         ` Paul E. McKenney
  0 siblings, 0 replies; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 17:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, josh, mathieu.desnoyers, jiangshanlai, luto, tony.luck,
	frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 06:03:04PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 19, 2020 at 05:37:00PM +0100, Peter Zijlstra wrote:
> > On Wed, Feb 19, 2020 at 08:31:56AM -0800, Paul E. McKenney wrote:
> 
> > > Here is the latest version of that comment, posted by Steve Rostedt.
> > > 
> > > 							Thanx, Paul
> > > 
> > > /*
> > >  * All functions called in the breakpoint trap handler (e.g. do_int3()
> > >  * on x86), must not allow kprobes until the kprobe breakpoint handler
> > >  * is called, otherwise it can cause an infinite recursion.
> > >  * On some archs, rcu_nmi_enter() is called in the breakpoint handler
> > >  * before the kprobe breakpoint handler is called, thus it must be
> > >  * marked as NOKPROBE.
> > >  */
> > 
> > Oh right, let me stick that in a separate patch. Best we not loose that
> > I suppose ;-)
> 
> Having gone over the old thread, I ended up with the below. Anyone
> holler if I got it wrong somehow.

Looks good to me!

							Thanx, Paul

> ---
> Subject: rcu: Provide comment for NOKPROBE() on rcu_nmi_enter()
> From: Steven Rostedt <rostedt@goodmis.org>
> 
> From: Steven Rostedt <rostedt@goodmis.org>
> 
> The rcu_nmi_enter() function was marked NOKPROBE() by commit
> c13324a505c77 ("x86/kprobes: Prohibit probing on functions before
> kprobe_int3_handler()") because the do_int3() call kprobe code must
> not be invoked before kprobe_int3_handler() is called.  It turns out
> that ist_enter() (in do_int3()) calls rcu_nmi_enter(), hence the
> marking NOKPROBE() being added to rcu_nmi_enter().
> 
> This commit therefore adds a comment documenting this line of
> reasoning.
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/rcu/tree.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -842,6 +842,14 @@ void rcu_nmi_enter(void)
>  {
>  	rcu_nmi_enter_common(false);
>  }
> +/*
> + * All functions called in the breakpoint trap handler (e.g. do_int3()
> + * on x86), must not allow kprobes until the kprobe breakpoint handler
> + * is called, otherwise it can cause an infinite recursion.
> + * On some archs, rcu_nmi_enter() is called in the breakpoint handler
> + * before the kprobe breakpoint handler is called, thus it must be
> + * marked as NOKPROBE.
> + */
>  NOKPROBE_SYMBOL(rcu_nmi_enter);
>  
>  /**

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
  2020-02-19 17:21     ` Andy Lutomirski
  2020-02-19 17:33       ` Peter Zijlstra
@ 2020-02-19 17:42       ` Borislav Petkov
  2020-02-19 17:46         ` Peter Zijlstra
  1 sibling, 1 reply; 99+ messages in thread
From: Borislav Petkov @ 2020-02-19 17:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, LKML, linux-arch, Steven Rostedt, Ingo Molnar,
	Joel Fernandes, Greg KH, gustavo, Thomas Gleixner, paulmck,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Tony Luck,
	Frederic Weisbecker, Dan Carpenter, Masami Hiramatsu

On Wed, Feb 19, 2020 at 09:21:48AM -0800, Andy Lutomirski wrote:
> Unless there is a signal pending and the signal setup code is about to
> hit the same failed memory.  I suppose we can just treat cases like
> this as "oh well, time to kill the whole system".
>
> But we should genuinely agree that we're okay with deferring this handling.

Good catch!

static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
{

	...

		/* deal with pending signal delivery */
                if (cached_flags & _TIF_SIGPENDING)
                        do_signal(regs);

                if (cached_flags & _TIF_NOTIFY_RESUME) {
                        clear_thread_flag(TIF_NOTIFY_RESUME);
                        tracehook_notify_resume(regs);
                        rseq_handle_notify_resume(NULL, regs);
                }


Err, can we make task_work run before we handle signals? Or there's a
reason it is run in this order?

Comment over task_work_add() says:

 * This is like the signal handler which runs in kernel mode, but it doesn't
 * try to wake up the @task.

which sounds to me like this should really run before the signal
handlers...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
  2020-02-19 17:42       ` Borislav Petkov
@ 2020-02-19 17:46         ` Peter Zijlstra
  0 siblings, 0 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-19 17:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, LKML, linux-arch, Steven Rostedt, Ingo Molnar,
	Joel Fernandes, Greg KH, gustavo, Thomas Gleixner, paulmck,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Tony Luck,
	Frederic Weisbecker, Dan Carpenter, Masami Hiramatsu

On Wed, Feb 19, 2020 at 06:42:23PM +0100, Borislav Petkov wrote:
> On Wed, Feb 19, 2020 at 09:21:48AM -0800, Andy Lutomirski wrote:
> > Unless there is a signal pending and the signal setup code is about to
> > hit the same failed memory.  I suppose we can just treat cases like
> > this as "oh well, time to kill the whole system".
> >
> > But we should genuinely agree that we're okay with deferring this handling.
> 
> Good catch!
> 
> static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
> {
> 
> 	...
> 
> 		/* deal with pending signal delivery */
>                 if (cached_flags & _TIF_SIGPENDING)
>                         do_signal(regs);
> 
>                 if (cached_flags & _TIF_NOTIFY_RESUME) {
>                         clear_thread_flag(TIF_NOTIFY_RESUME);
>                         tracehook_notify_resume(regs);
>                         rseq_handle_notify_resume(NULL, regs);
>                 }
> 
> 
> Err, can we make task_work run before we handle signals? Or there's a
> reason it is run in this order?
> 
> Comment over task_work_add() says:
> 
>  * This is like the signal handler which runs in kernel mode, but it doesn't
>  * try to wake up the @task.
> 
> which sounds to me like this should really run before the signal
> handlers...

here goes...

--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -155,16 +155,16 @@ static void exit_to_usermode_loop(struct
 		if (cached_flags & _TIF_PATCH_PENDING)
 			klp_update_patch_state(current);
 
-		/* deal with pending signal delivery */
-		if (cached_flags & _TIF_SIGPENDING)
-			do_signal(regs);
-
 		if (cached_flags & _TIF_NOTIFY_RESUME) {
 			clear_thread_flag(TIF_NOTIFY_RESUME);
 			tracehook_notify_resume(regs);
 			rseq_handle_notify_resume(NULL, regs);
 		}
 
+		/* deal with pending signal delivery */
+		if (cached_flags & _TIF_SIGPENDING)
+			do_signal(regs);
+
 		if (cached_flags & _TIF_USER_RETURN_NOTIFY)
 			fire_user_return_notifiers();
 

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again)
  2020-02-19 17:40           ` Paul E. McKenney
@ 2020-02-19 18:00             ` Steven Rostedt
  2020-02-19 19:05               ` Paul E. McKenney
  0 siblings, 1 reply; 99+ messages in thread
From: Steven Rostedt @ 2020-02-19 18:00 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Wed, 19 Feb 2020 09:40:25 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> > Correct, and if rcuidle is not set, and this is a macro, the SRCU
> > portion is compiled out.  
> 
> Sigh!  Apologies for the noise!
> 
> If we are using SRCU, we don't care whether or not RCU is watching.  OK,
> maybe finally catching up -- the whole point was use of RCU in other
> tracing code, wasn't it?

Some callbacks (namely perf) might use RCU, but then the callbacks
need to make sure rcu is watching.

-- Steve

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again)
  2020-02-19 18:00             ` Steven Rostedt
@ 2020-02-19 19:05               ` Paul E. McKenney
  0 siblings, 0 replies; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-19 19:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 01:00:12PM -0500, Steven Rostedt wrote:
> On Wed, 19 Feb 2020 09:40:25 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > > Correct, and if rcuidle is not set, and this is a macro, the SRCU
> > > portion is compiled out.  
> > 
> > Sigh!  Apologies for the noise!
> > 
> > If we are using SRCU, we don't care whether or not RCU is watching.  OK,
> > maybe finally catching up -- the whole point was use of RCU in other
> > tracing code, wasn't it?
> 
> Some callbacks (namely perf) might use RCU, but then the callbacks
> need to make sure rcu is watching.

Got it, thank you!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
  2020-02-19 17:33       ` Peter Zijlstra
@ 2020-02-19 22:12         ` Andy Lutomirski
  2020-02-19 22:33           ` Luck, Tony
  2020-02-20  7:39           ` Peter Zijlstra
  0 siblings, 2 replies; 99+ messages in thread
From: Andy Lutomirski @ 2020-02-19 22:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Borislav Petkov, LKML, linux-arch,
	Steven Rostedt, Ingo Molnar, Joel Fernandes, Greg KH, gustavo,
	Thomas Gleixner, paulmck, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Tony Luck, Frederic Weisbecker, Dan Carpenter,
	Masami Hiramatsu

On Wed, Feb 19, 2020 at 9:34 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Feb 19, 2020 at 09:21:48AM -0800, Andy Lutomirski wrote:
> > On Wed, Feb 19, 2020 at 9:13 AM Borislav Petkov <bp@alien8.de> wrote:
> > >
> > > On Wed, Feb 19, 2020 at 03:47:26PM +0100, Peter Zijlstra wrote:
> > > > Subject: Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
> > >
> > > x86/mce: ...
> > >
> > > > It is an abomination; and in prepration of removing the whole
> > > > ist_enter() thing, it needs to go.
> > > >
> > > > Convert #MC over to using task_work_add() instead; it will run the
> > > > same code slightly later, on the return to user path of the same
> > > > exception.
> > >
> > > That's fine because the error happened in userspace.
> >
> > Unless there is a signal pending and the signal setup code is about to
> > hit the same failed memory.  I suppose we can just treat cases like
> > this as "oh well, time to kill the whole system".
> >
> > But we should genuinely agree that we're okay with deferring this handling.
>
> It doesn't delay much. The moment it does that local_irq_enable() it's
> subject to preemption, just like it is on the return to user path.
>
> Do you really want to create code that unwinds enough of nmi_enter() to
> get you to a preemptible context? *shudder*

Well, there's another way to approach this:

void notrace nonothing do_machine_check(struct pt_regs *regs)
{
  if (user_mode(regs))
    do_sane_machine_check(regs);
  else
    do_awful_machine_check(regs);
}

void do_sane_machine_check(regs)
{
  nothing special here.  just a regular exception, more or less.
}

void do_awful_macine_check(regs)
{
  basically an NMI.  No funny business, no recovery possible.
task_work_add() not allowed.
}

Or, even better, depending on how tglx's series shakes out, we could
build on this patch:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/idtentry&id=ebd8303dda34ea21476e4493cee671d998e83a48

and actually have two separate do_machine_check entry points.  So we'd
do, roughly:

idtentry_ist ... normal_stack_entry=do_sane_machine_check
ist_stack_entry=do_awful_machine_check

and now there's no chance for confusion.

All of the above has some issues when Tony decides that he wants to
recover from specially annotated recoverable kernel memory accesses.
Then task_word_add() is a nonstarter, but the current
stack-switch-from-usermode *also* doesn't work.  I floated the idea of
also doing the stack switch if we come from an IF=1 context, but
that's starting to get nasty.

One big question here: are memory failure #MC exceptions synchronous
or can they be delayed?   If we get a memory failure, is it possible
that the #MC hits some random context and not the actual context where
the error occurred?

I suppose the general consideration I'm trying to get at is: is
task_work_add() actually useful at all here?  For the case when a
kernel thread does memcpy_mcsafe() or similar, task work registered
using task_work_add() will never run.

--Andy

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
  2020-02-19 22:12         ` Andy Lutomirski
@ 2020-02-19 22:33           ` Luck, Tony
  2020-02-19 22:48             ` Andy Lutomirski
  2020-02-20  7:39           ` Peter Zijlstra
  1 sibling, 1 reply; 99+ messages in thread
From: Luck, Tony @ 2020-02-19 22:33 UTC (permalink / raw)
  To: Andy Lutomirski, Peter Zijlstra
  Cc: Borislav Petkov, LKML, linux-arch, Steven Rostedt, Ingo Molnar,
	Joel Fernandes, Greg KH, gustavo, Thomas Gleixner, paulmck,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan,
	Frederic Weisbecker, Dan Carpenter, Masami Hiramatsu

> One big question here: are memory failure #MC exceptions synchronous
> or can they be delayed?   If we get a memory failure, is it possible
> that the #MC hits some random context and not the actual context where
> the error occurred?

There are a few cases:
1) SRAO (Software recoverable action optional) [Patrol scrub or L3 cache eviction]
These aren't synchronous with any core execution. Using machine check to signal
was probably a mistake - compounded by it being broadcast :-(  Could pick any CPU
to handle (actually choose the first to arrive in do_machine_check()). That guy should
arrange to soft offline the affected page. Every CPU can return to what they were doing
before.

2) SRAR (Software recoverable action required)
These are synchronous. Starting with Skylake they may be signaled just to the thread
that hit the poison. Earlier generations broadcast.
	2a) Hit in ring3 code ... we want to offline the page and SIGBUS the task(s)
	2b) Memcpy_mcsafe() ... kernel has a recovery path. "Return" to the recovery code instead of to the original RIP.
	2c) copy_from_user ... not implemented yet. We are in kernel, but would like to treat this like case 2a

3) Fatal
Always broadcast. Some bank has MCi_STATUS.PCC==1. System must be shutdown.

-Tony

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
  2020-02-19 22:33           ` Luck, Tony
@ 2020-02-19 22:48             ` Andy Lutomirski
  0 siblings, 0 replies; 99+ messages in thread
From: Andy Lutomirski @ 2020-02-19 22:48 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Andy Lutomirski, Peter Zijlstra, Borislav Petkov, LKML,
	linux-arch, Steven Rostedt, Ingo Molnar, Joel Fernandes, Greg KH,
	gustavo, Thomas Gleixner, paulmck, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Frederic Weisbecker,
	Dan Carpenter, Masami Hiramatsu


> On Feb 19, 2020, at 2:33 PM, Luck, Tony <tony.luck@intel.com> wrote:
> 
> 
>> 
>> One big question here: are memory failure #MC exceptions synchronous
>> or can they be delayed?   If we get a memory failure, is it possible
>> that the #MC hits some random context and not the actual context where
>> the error occurred?
> 
> There are a few cases:
> 1) SRAO (Software recoverable action optional) [Patrol scrub or L3 cache eviction]
> These aren't synchronous with any core execution. Using machine check to signal
> was probably a mistake - compounded by it being broadcast :-(  Could pick any CPU
> to handle (actually choose the first to arrive in do_machine_check()). That guy should
> arrange to soft offline the affected page. Every CPU can return to what they were doing
> before.

You could handle this by sending IPI-to-self and dealing with it in the interrupt handler. Or even wake a high-priority kthread or workqueue. irq_work may help. Relying on task_work or the non_atomic stuff seems silly - you can’t rely on anything about the interrupted context, and the context is more or less irrelevant anyway.

> 
> 2) SRAR (Software recoverable action required)
> These are synchronous. Starting with Skylake they may be signaled just to the thread
> that hit the poison. Earlier generations broadcast.

Here’s where dealing with one that came from kernel code is just nasty, right?

I would argue that, if IF=0, killing the machine is reasonable.  If IF=1, we should be okay.  Actually making this work sanely is gross, and arguably the goal should be minimizing grossness.

Perhaps, if we came from kernel mode, we should IPI-to-self and use a special vector that is idtentry, not apicinterrupt.  Or maybe even do this for entries from usermode just to keep everything consistent.

>    2a) Hit in ring3 code ... we want to offline the page and SIGBUS the task(s)
>    2b) Memcpy_mcsafe() ... kernel has a recovery path. "Return" to the recovery code instead of to the original RIP.
>    2c) copy_from_user ... not implemented yet. We are in kernel, but would like to treat this like case 2a
> 
> 3) Fatal
> Always broadcast. Some bank has MCi_STATUS.PCC==1. System must be shutdown.

Easy :)

It would be really, really nice if NMI was masked in MCE context.

> 
> -Tony

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH] rcu/kprobes: Comment why rcu_nmi_enter() is marked NOKPROBE
  2020-02-19 17:16     ` [PATCH] rcu/kprobes: Comment why rcu_nmi_enter() is marked NOKPROBE Steven Rostedt
  2020-02-19 17:18       ` Joel Fernandes
  2020-02-19 17:41       ` Paul E. McKenney
@ 2020-02-20  5:54       ` Masami Hiramatsu
  2 siblings, 0 replies; 99+ messages in thread
From: Masami Hiramatsu @ 2020-02-20  5:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, Peter Zijlstra, linux-kernel, linux-arch,
	mingo, joel, gregkh, gustavo, tglx, josh, mathieu.desnoyers,
	jiangshanlai, luto, tony.luck, frederic, dan.carpenter, mhiramat

On Wed, 19 Feb 2020 12:16:09 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: Steven Rostedt (VMware) <rostedt@goodmis.org>
> 
> It's confusing that rcu_nmi_enter() is marked NOKPROBE and
> rcu_nmi_exit() is not. One may think that the exit needs to be marked
> for the same reason the enter is, as rcu_nmi_exit() reverts the RCU
> state back to what it was before rcu_nmi_enter(). But the reason has
> nothing to do with the state of RCU.
> 
> The breakpoint handler (int3 on x86) must not have any kprobe on it
> until the kprobe handler is called. Otherwise, it can cause an infinite
> recursion and crash the machine. It just so happens that
> rcu_nmi_enter() is called by the int3 handler before the kprobe handler
> can run, and therefore needs to be marked as NOKPROBE.
> 
> Comment this to remove the confusion to why rcu_nmi_enter() is marked
> NOKPROBE but rcu_nmi_exit() is not.

Looks good to me.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>

Thanks,

> 
> Link: https://lore.kernel.org/r/20200213163800.5c51a5f1@gandalf.local.home
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 1694a6b57ad8..ada7b2b638fb 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -846,6 +846,14 @@ void rcu_nmi_enter(void)
>  {
>  	rcu_nmi_enter_common(false);
>  }
> +/*
> + * All functions called in the breakpoint trap handler (e.g. do_int3()
> + * on x86), must not allow kprobes until the kprobe breakpoint handler
> + * is called, otherwise it can cause an infinite recursion.
> + * On some archs, rcu_nmi_enter() is called in the breakpoint handler
> + * before the kprobe breakpoint handler is called, thus it must be
> + * marked as NOKPROBE.
> + */
>  NOKPROBE_SYMBOL(rcu_nmi_enter);
>  
>  /**


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()
  2020-02-19 22:12         ` Andy Lutomirski
  2020-02-19 22:33           ` Luck, Tony
@ 2020-02-20  7:39           ` Peter Zijlstra
  1 sibling, 0 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-20  7:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Borislav Petkov, LKML, linux-arch, Steven Rostedt, Ingo Molnar,
	Joel Fernandes, Greg KH, gustavo, Thomas Gleixner, paulmck,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Tony Luck,
	Frederic Weisbecker, Dan Carpenter, Masami Hiramatsu

On Wed, Feb 19, 2020 at 02:12:13PM -0800, Andy Lutomirski wrote:
> On Wed, Feb 19, 2020 at 9:34 AM Peter Zijlstra <peterz@infradead.org> wrote:

> > Do you really want to create code that unwinds enough of nmi_enter() to
> > get you to a preemptible context? *shudder*
> 
> Well, there's another way to approach this:
> 
> void notrace nonothing do_machine_check(struct pt_regs *regs)
> {
>   if (user_mode(regs))
>     do_sane_machine_check(regs);
>   else
>     do_awful_machine_check(regs);
> }
> 
> void do_sane_machine_check(regs)
> {
>   nothing special here.  just a regular exception, more or less.
> }
> 
> void do_awful_macine_check(regs)
> {
>   basically an NMI.  No funny business, no recovery possible.
> task_work_add() not allowed.
> }

Right, that looks like major surgery to the current code though; I'd
much prefer someone that knows that code do that.

> I suppose the general consideration I'm trying to get at is: is
> task_work_add() actually useful at all here?  For the case when a
> kernel thread does memcpy_mcsafe() or similar, task work registered
> using task_work_add() will never run.

task_work isn't at all useful when we didn't come from userspace. In
that case irq_work is the best option, but that doesn't provide a
preemptible context.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter()
  2020-02-19 14:47 ` [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter() Peter Zijlstra
  2020-02-19 15:31   ` Steven Rostedt
@ 2020-02-20  8:41   ` Will Deacon
  2020-02-20  9:19   ` Marc Zyngier
  2020-02-20 13:18   ` Petr Mladek
  3 siblings, 0 replies; 99+ messages in thread
From: Will Deacon @ 2020-02-20  8:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, paulmck, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat, Marc Zyngier,
	Michael Ellerman, Petr Mladek

On Wed, Feb 19, 2020 at 03:47:25PM +0100, Peter Zijlstra wrote:
> Since there are already a number of sites (ARM64, PowerPC) that
> effectively nest nmi_enter(), lets make the primitive support this
> before adding even more.
> 
> Cc: Will Deacon <will@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arm64/include/asm/hardirq.h |    4 ++--
>  arch/arm64/kernel/sdei.c         |   14 ++------------
>  arch/arm64/kernel/traps.c        |    8 ++------

For these arm64 bits:

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter()
  2020-02-19 14:47 ` [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter() Peter Zijlstra
  2020-02-19 15:31   ` Steven Rostedt
  2020-02-20  8:41   ` Will Deacon
@ 2020-02-20  9:19   ` Marc Zyngier
  2020-02-20 13:18   ` Petr Mladek
  3 siblings, 0 replies; 99+ messages in thread
From: Marc Zyngier @ 2020-02-20  9:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, paulmck, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat, Will Deacon,
	Michael Ellerman, Petr Mladek

On 2020-02-19 14:47, Peter Zijlstra wrote:
> Since there are already a number of sites (ARM64, PowerPC) that
> effectively nest nmi_enter(), lets make the primitive support this
> before adding even more.
> 
> Cc: Will Deacon <will@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Thanks for cleaning this up!

Acked-by: Marc Zyngier <maz@kernel.org>

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 08/22] rcu,tracing: Create trace_rcu_{enter,exit}()
  2020-02-19 16:44           ` Paul E. McKenney
@ 2020-02-20 10:34             ` Peter Zijlstra
  2020-02-20 13:58               ` Paul E. McKenney
  0 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-20 10:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 08:44:50AM -0800, Paul E. McKenney wrote:
> On Wed, Feb 19, 2020 at 05:35:35PM +0100, Peter Zijlstra wrote:

> > Possibly, and I suppose the current version is less obviously dependent
> > on the in_nmi() functionality as was the previous, seeing how Paul
> > frobbed that all the way into the rcu_irq_enter*() implementation.
> > 
> > So sure, I can go move it I suppose.
> 
> No objections here.

It now looks like so:

---
Subject: rcu,tracing: Create trace_rcu_{enter,exit}()
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Feb 12 09:18:57 CET 2020

To facilitate tracers that need RCU, add some helpers to wrap the
magic required.

The problem is that we can call into tracers (trace events and
function tracing) while RCU isn't watching and this can happen from
any context, including NMI.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/rcupdate.h |   29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -175,6 +175,35 @@ do { \
 #error "Unknown RCU implementation specified to kernel configuration"
 #endif
 
+/**
+ * trace_rcu_enter - Force RCU to be active, for code that needs RCU readers
+ *
+ * Very similar to RCU_NONIDLE() above.
+ *
+ * Tracing can happen while RCU isn't active yet, for instance in the idle loop
+ * between rcu_idle_enter() and rcu_idle_exit(), or early in exception entry.
+ * RCU will happily ignore any read-side critical sections in this case.
+ *
+ * This function ensures that RCU is aware hereafter and the code can readily
+ * rely on RCU read-side critical sections working as expected.
+ *
+ * This function is NMI safe -- provided in_nmi() is correct and will nest up-to
+ * INT_MAX/2 times.
+ */
+static inline int trace_rcu_enter(void)
+{
+	int state = !rcu_is_watching();
+	if (state)
+		rcu_irq_enter_irqsave();
+	return state;
+}
+
+static inline void trace_rcu_exit(int state)
+{
+	if (state)
+		rcu_irq_exit_irqsave();
+}
+
 /*
  * The init_rcu_head_on_stack() and destroy_rcu_head_on_stack() calls
  * are needed for dynamic initialization and destruction of rcu_head

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 22/22] x86/int3: Ensure that poke_int3_handler() is not sanitized
  2020-02-19 17:20       ` Peter Zijlstra
@ 2020-02-20 10:37         ` Dmitry Vyukov
  2020-02-20 12:06           ` Peter Zijlstra
  0 siblings, 1 reply; 99+ messages in thread
From: Dmitry Vyukov @ 2020-02-20 10:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, linux-arch, Steven Rostedt, Ingo Molnar, Joel Fernandes,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Andy Lutomirski, tony.luck, Frederic Weisbecker,
	Dan Carpenter, Masami Hiramatsu, Andrey Ryabinin, kasan-dev

On Wed, Feb 19, 2020 at 6:20 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Feb 19, 2020 at 05:30:25PM +0100, Peter Zijlstra wrote:
>
> > By inlining everything in poke_int3_handler() (except bsearch :/) we can
> > mark the whole function off limits to everything and call it a day. That
> > simplicity has been the guiding principle so far.
> >
> > Alternatively we can provide an __always_inline variant of bsearch().
>
> This reduces the __no_sanitize usage to just the exception entry
> (do_int3) and the critical function: poke_int3_handler().
>
> Is this more acceptible?

Let's say it's more acceptable.

Acked-by: Dmitry Vyukov <dvyukov@google.com>

I guess there is no ideal solution here.

Just a straw man proposal: expected number of elements is large enough
to make bsearch profitable, right? I see 1 is a common case, but the
other case has multiple entries.

> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -979,7 +979,7 @@ static __always_inline void *text_poke_a
>         return _stext + tp->rel_addr;
>  }
>
> -static int notrace __no_sanitize patch_cmp(const void *key, const void *elt)
> +static __always_inline int patch_cmp(const void *key, const void *elt)
>  {
>         struct text_poke_loc *tp = (struct text_poke_loc *) elt;
>
> @@ -989,7 +989,6 @@ static int notrace __no_sanitize patch_c
>                 return 1;
>         return 0;
>  }
> -NOKPROBE_SYMBOL(patch_cmp);
>
>  int notrace __no_sanitize poke_int3_handler(struct pt_regs *regs)
>  {
> @@ -1024,9 +1023,9 @@ int notrace __no_sanitize poke_int3_hand
>          * Skip the binary search if there is a single member in the vector.
>          */
>         if (unlikely(desc->nr_entries > 1)) {
> -               tp = bsearch(ip, desc->vec, desc->nr_entries,
> -                            sizeof(struct text_poke_loc),
> -                            patch_cmp);
> +               tp = __bsearch(ip, desc->vec, desc->nr_entries,
> +                              sizeof(struct text_poke_loc),
> +                              patch_cmp);
>                 if (!tp)
>                         goto out_put;
>         } else {
> --- a/include/linux/bsearch.h
> +++ b/include/linux/bsearch.h
> @@ -4,7 +4,29 @@
>
>  #include <linux/types.h>
>
> -void *bsearch(const void *key, const void *base, size_t num, size_t size,
> -             cmp_func_t cmp);
> +static __always_inline
> +void *__bsearch(const void *key, const void *base, size_t num, size_t size, cmp_func_t cmp)
> +{
> +       const char *pivot;
> +       int result;
> +
> +       while (num > 0) {
> +               pivot = base + (num >> 1) * size;
> +               result = cmp(key, pivot);
> +
> +               if (result == 0)
> +                       return (void *)pivot;
> +
> +               if (result > 0) {
> +                       base = pivot + size;
> +                       num--;
> +               }
> +               num >>= 1;
> +       }
> +
> +       return NULL;
> +}
> +
> +extern void *bsearch(const void *key, const void *base, size_t num, size_t size, cmp_func_t cmp);
>
>  #endif /* _LINUX_BSEARCH_H */
> --- a/lib/bsearch.c
> +++ b/lib/bsearch.c
> @@ -28,27 +28,9 @@
>   * the key and elements in the array are of the same type, you can use
>   * the same comparison function for both sort() and bsearch().
>   */
> -void __no_sanitize *bsearch(const void *key, const void *base, size_t num, size_t size,
> -             cmp_func_t cmp)
> +void *bsearch(const void *key, const void *base, size_t num, size_t size, cmp_func_t cmp)
>  {
> -       const char *pivot;
> -       int result;
> -
> -       while (num > 0) {
> -               pivot = base + (num >> 1) * size;
> -               result = cmp(key, pivot);
> -
> -               if (result == 0)
> -                       return (void *)pivot;
> -
> -               if (result > 0) {
> -                       base = pivot + size;
> -                       num--;
> -               }
> -               num >>= 1;
> -       }
> -
> -       return NULL;
> +       __bsearch(key, base, num, size, cmp);
>  }
>  EXPORT_SYMBOL(bsearch);
>  NOKPROBE_SYMBOL(bsearch);

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 03/22] x86: Replace ist_enter() with nmi_enter()
  2020-02-19 14:47 ` [PATCH v3 03/22] x86: Replace ist_enter() with nmi_enter() Peter Zijlstra
@ 2020-02-20 10:54   ` Borislav Petkov
  2020-02-20 12:11     ` Peter Zijlstra
  0 siblings, 1 reply; 99+ messages in thread
From: Borislav Petkov @ 2020-02-20 10:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, paulmck, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 03:47:27PM +0100, Peter Zijlstra wrote:
> @@ -1220,7 +1220,7 @@ static void mce_kill_me_maybe(struct cal
>   * MCE broadcast. However some CPUs might be broken beyond repair,
>   * so be always careful when synchronizing with others.
>   */
> -void do_machine_check(struct pt_regs *regs, long error_code)
> +notrace void do_machine_check(struct pt_regs *regs, long error_code)

Is there a convention where the notrace marker should come in the
function signature? I see all possible combinations while grepping...

>  {
>  	DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
>  	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
> @@ -1254,10 +1254,10 @@ void do_machine_check(struct pt_regs *re
>  	 */
>  	int lmce = 1;
>  
> -	if (__mc_check_crashing_cpu(cpu))
> -		return;
> +	nmi_enter();
>  
> -	ist_enter(regs);
> +	if (__mc_check_crashing_cpu(cpu))
> +		goto out;
>  
>  	this_cpu_inc(mce_exception_count);
>  

Should that __mc_check_crashing_cpu() happen before nmi_enter? The
function is doing only a bunch of checks and clearing MSRs for bystander
CPUs...

> @@ -1346,7 +1346,7 @@ void do_machine_check(struct pt_regs *re
>  	sync_core();
>  
>  	if (worst != MCE_AR_SEVERITY && !kill_it)
> -		goto out_ist;
> +		goto out;
>  
>  	/* Fault was in user mode and we need to take some action */
>  	if ((m.cs & 3) == 3) {
> @@ -1362,10 +1362,11 @@ void do_machine_check(struct pt_regs *re
>  			mce_panic("Failed kernel mode recovery", &m, msg);
>  	}
>  
> -out_ist:
> -	ist_exit(regs);
> +out:
> +	nmi_exit();
>  }
>  EXPORT_SYMBOL_GPL(do_machine_check);
> +NOKPROBE_SYMBOL(do_machine_check);

Yah, that's a good idea regardless.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 22/22] x86/int3: Ensure that poke_int3_handler() is not sanitized
  2020-02-20 10:37         ` Dmitry Vyukov
@ 2020-02-20 12:06           ` Peter Zijlstra
  2020-02-20 16:22             ` Dmitry Vyukov
  0 siblings, 1 reply; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-20 12:06 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: LKML, linux-arch, Steven Rostedt, Ingo Molnar, Joel Fernandes,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Andy Lutomirski, tony.luck, Frederic Weisbecker,
	Dan Carpenter, Masami Hiramatsu, Andrey Ryabinin, kasan-dev

On Thu, Feb 20, 2020 at 11:37:32AM +0100, Dmitry Vyukov wrote:
> On Wed, Feb 19, 2020 at 6:20 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, Feb 19, 2020 at 05:30:25PM +0100, Peter Zijlstra wrote:
> >
> > > By inlining everything in poke_int3_handler() (except bsearch :/) we can
> > > mark the whole function off limits to everything and call it a day. That
> > > simplicity has been the guiding principle so far.
> > >
> > > Alternatively we can provide an __always_inline variant of bsearch().
> >
> > This reduces the __no_sanitize usage to just the exception entry
> > (do_int3) and the critical function: poke_int3_handler().
> >
> > Is this more acceptible?
> 
> Let's say it's more acceptable.
> 
> Acked-by: Dmitry Vyukov <dvyukov@google.com>

Thanks, I'll go make it happen.

> I guess there is no ideal solution here.
> 
> Just a straw man proposal: expected number of elements is large enough
> to make bsearch profitable, right? I see 1 is a common case, but the
> other case has multiple entries.

Latency was the consideration; the linear search would dramatically
increase the runtime of the exception.

The current limit is 256 entries and we're hitting that quite often.

(we can trivially increase, but nobody has been able to show significant
benefits for that -- as of yet)

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 03/22] x86: Replace ist_enter() with nmi_enter()
  2020-02-20 10:54   ` Borislav Petkov
@ 2020-02-20 12:11     ` Peter Zijlstra
  0 siblings, 0 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-20 12:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, paulmck, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Thu, Feb 20, 2020 at 11:54:39AM +0100, Borislav Petkov wrote:
> On Wed, Feb 19, 2020 at 03:47:27PM +0100, Peter Zijlstra wrote:
> > @@ -1220,7 +1220,7 @@ static void mce_kill_me_maybe(struct cal
> >   * MCE broadcast. However some CPUs might be broken beyond repair,
> >   * so be always careful when synchronizing with others.
> >   */
> > -void do_machine_check(struct pt_regs *regs, long error_code)
> > +notrace void do_machine_check(struct pt_regs *regs, long error_code)
> 
> Is there a convention where the notrace marker should come in the
> function signature? I see all possible combinations while grepping...

Same place as inline I think.

> >  {
> >  	DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
> >  	DECLARE_BITMAP(toclear, MAX_NR_BANKS);
> > @@ -1254,10 +1254,10 @@ void do_machine_check(struct pt_regs *re
> >  	 */
> >  	int lmce = 1;
> >  
> > -	if (__mc_check_crashing_cpu(cpu))
> > -		return;
> > +	nmi_enter();
> >  
> > -	ist_enter(regs);
> > +	if (__mc_check_crashing_cpu(cpu))
> > +		goto out;
> >  
> >  	this_cpu_inc(mce_exception_count);
> >  
> 
> Should that __mc_check_crashing_cpu() happen before nmi_enter? The
> function is doing only a bunch of checks and clearing MSRs for bystander
> CPUs...

You'll note the lack of notrace on that function, and we must not call
into tracers before nmi_enter().

AFAICT there really is no benefit to trying to lift it before
nmi_enter().

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE
  2020-02-19 15:57       ` Peter Zijlstra
  2020-02-19 16:04         ` Peter Zijlstra
@ 2020-02-20 12:17         ` Borislav Petkov
  2020-02-20 12:37           ` Peter Zijlstra
  1 sibling, 1 reply; 99+ messages in thread
From: Borislav Petkov @ 2020-02-20 12:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, paulmck, josh, mathieu.desnoyers, jiangshanlai,
	luto, tony.luck, frederic, dan.carpenter, mhiramat

On Wed, Feb 19, 2020 at 04:57:15PM +0100, Peter Zijlstra wrote:
> -		memmove(&gpregs->ip, (void *)regs->sp, 5*8);
> +		for (i = 0; i < count; i++) {
> +			int idx = (dst <= src) ? i : count - i;
> +			dst[idx] = src[idx];
> +		}

Or, you can actually unroll it. This way it even documents clearly what
it does:

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index fe38015ed50a..2b790a574ba5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -298,6 +298,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
 		regs->ip == (unsigned long)native_irq_return_iret)
 	{
 		struct pt_regs *gpregs = (struct pt_regs *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
+		unsigned long *p = (unsigned long *)regs->sp;
 
 		/*
 		 * regs->sp points to the failing IRET frame on the
@@ -305,7 +306,11 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
 		 * in gpregs->ss through gpregs->ip.
 		 *
 		 */
-		memmove(&gpregs->ip, (void *)regs->sp, 5*8);
+		gpregs->ip	= *p;
+		gpregs->cs	= *(p + 1);
+		gpregs->flags	= *(p + 2);
+		gpregs->sp	= *(p + 3);
+		gpregs->ss	= *(p + 4);
 		gpregs->orig_ax = 0;  /* Missing (lost) #GP error code */
 
 		/*

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE
  2020-02-20 12:17         ` Borislav Petkov
@ 2020-02-20 12:37           ` Peter Zijlstra
  0 siblings, 0 replies; 99+ messages in thread
From: Peter Zijlstra @ 2020-02-20 12:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Steven Rostedt, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, paulmck, josh, mathieu.desnoyers, jiangshanlai,
	luto, tony.luck, frederic, dan.carpenter, mhiramat

On Thu, Feb 20, 2020 at 01:17:27PM +0100, Borislav Petkov wrote:
> On Wed, Feb 19, 2020 at 04:57:15PM +0100, Peter Zijlstra wrote:
> > -		memmove(&gpregs->ip, (void *)regs->sp, 5*8);
> > +		for (i = 0; i < count; i++) {
> > +			int idx = (dst <= src) ? i : count - i;
> > +			dst[idx] = src[idx];
> > +		}
> 
> Or, you can actually unroll it. This way it even documents clearly what
> it does:
> 
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index fe38015ed50a..2b790a574ba5 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -298,6 +298,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
>  		regs->ip == (unsigned long)native_irq_return_iret)
>  	{
>  		struct pt_regs *gpregs = (struct pt_regs *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1;
> +		unsigned long *p = (unsigned long *)regs->sp;
>  
>  		/*
>  		 * regs->sp points to the failing IRET frame on the
> @@ -305,7 +306,11 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
>  		 * in gpregs->ss through gpregs->ip.
>  		 *
>  		 */
> -		memmove(&gpregs->ip, (void *)regs->sp, 5*8);
> +		gpregs->ip	= *p;
> +		gpregs->cs	= *(p + 1);
> +		gpregs->flags	= *(p + 2);
> +		gpregs->sp	= *(p + 3);
> +		gpregs->ss	= *(p + 4);
>  		gpregs->orig_ax = 0;  /* Missing (lost) #GP error code */
>  
>  		/*

While I love that; is that actually correct? This is an unroll of
memcpy() not memmove(). IFF the ranges overlap, the above is buggered.

Was the original memmove() really needed?

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter()
  2020-02-19 14:47 ` [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter() Peter Zijlstra
                     ` (2 preceding siblings ...)
  2020-02-20  9:19   ` Marc Zyngier
@ 2020-02-20 13:18   ` Petr Mladek
  3 siblings, 0 replies; 99+ messages in thread
From: Petr Mladek @ 2020-02-20 13:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, rostedt, mingo, joel, gregkh, gustavo,
	tglx, paulmck, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat, Will Deacon,
	Marc Zyngier, Michael Ellerman

On Wed 2020-02-19 15:47:25, Peter Zijlstra wrote:
> Since there are already a number of sites (ARM64, PowerPC) that
> effectively nest nmi_enter(), lets make the primitive support this
> before adding even more.

Reviewed-by: Petr Mladek <pmladek@suse.com>	# for printk part

The rest looks good as well.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 08/22] rcu,tracing: Create trace_rcu_{enter,exit}()
  2020-02-20 10:34             ` Peter Zijlstra
@ 2020-02-20 13:58               ` Paul E. McKenney
  0 siblings, 0 replies; 99+ messages in thread
From: Paul E. McKenney @ 2020-02-20 13:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, linux-kernel, linux-arch, mingo, joel, gregkh,
	gustavo, tglx, josh, mathieu.desnoyers, jiangshanlai, luto,
	tony.luck, frederic, dan.carpenter, mhiramat

On Thu, Feb 20, 2020 at 11:34:21AM +0100, Peter Zijlstra wrote:
> On Wed, Feb 19, 2020 at 08:44:50AM -0800, Paul E. McKenney wrote:
> > On Wed, Feb 19, 2020 at 05:35:35PM +0100, Peter Zijlstra wrote:
> 
> > > Possibly, and I suppose the current version is less obviously dependent
> > > on the in_nmi() functionality as was the previous, seeing how Paul
> > > frobbed that all the way into the rcu_irq_enter*() implementation.
> > > 
> > > So sure, I can go move it I suppose.
> > 
> > No objections here.
> 
> It now looks like so:
> 
> ---
> Subject: rcu,tracing: Create trace_rcu_{enter,exit}()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed Feb 12 09:18:57 CET 2020
> 
> To facilitate tracers that need RCU, add some helpers to wrap the
> magic required.
> 
> The problem is that we can call into tracers (trace events and
> function tracing) while RCU isn't watching and this can happen from
> any context, including NMI.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  include/linux/rcupdate.h |   29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -175,6 +175,35 @@ do { \
>  #error "Unknown RCU implementation specified to kernel configuration"
>  #endif
>  
> +/**
> + * trace_rcu_enter - Force RCU to be active, for code that needs RCU readers
> + *
> + * Very similar to RCU_NONIDLE() above.
> + *
> + * Tracing can happen while RCU isn't active yet, for instance in the idle loop
> + * between rcu_idle_enter() and rcu_idle_exit(), or early in exception entry.
> + * RCU will happily ignore any read-side critical sections in this case.
> + *
> + * This function ensures that RCU is aware hereafter and the code can readily
> + * rely on RCU read-side critical sections working as expected.
> + *
> + * This function is NMI safe -- provided in_nmi() is correct and will nest up-to
> + * INT_MAX/2 times.
> + */
> +static inline int trace_rcu_enter(void)
> +{
> +	int state = !rcu_is_watching();
> +	if (state)
> +		rcu_irq_enter_irqsave();
> +	return state;
> +}
> +
> +static inline void trace_rcu_exit(int state)
> +{
> +	if (state)
> +		rcu_irq_exit_irqsave();
> +}
> +
>  /*
>   * The init_rcu_head_on_stack() and destroy_rcu_head_on_stack() calls
>   * are needed for dynamic initialization and destruction of rcu_head

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH v3 22/22] x86/int3: Ensure that poke_int3_handler() is not sanitized
  2020-02-20 12:06           ` Peter Zijlstra
@ 2020-02-20 16:22             ` Dmitry Vyukov
  0 siblings, 0 replies; 99+ messages in thread
From: Dmitry Vyukov @ 2020-02-20 16:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, linux-arch, Steven Rostedt, Ingo Molnar, Joel Fernandes,
	Greg Kroah-Hartman, Gustavo A. R. Silva, Thomas Gleixner,
	Paul E. McKenney, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Andy Lutomirski, tony.luck, Frederic Weisbecker,
	Dan Carpenter, Masami Hiramatsu, Andrey Ryabinin, kasan-dev

On Thu, Feb 20, 2020 at 1:06 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Feb 20, 2020 at 11:37:32AM +0100, Dmitry Vyukov wrote:
> > On Wed, Feb 19, 2020 at 6:20 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Wed, Feb 19, 2020 at 05:30:25PM +0100, Peter Zijlstra wrote:
> > >
> > > > By inlining everything in poke_int3_handler() (except bsearch :/) we can
> > > > mark the whole function off limits to everything and call it a day. That
> > > > simplicity has been the guiding principle so far.
> > > >
> > > > Alternatively we can provide an __always_inline variant of bsearch().
> > >
> > > This reduces the __no_sanitize usage to just the exception entry
> > > (do_int3) and the critical function: poke_int3_handler().
> > >
> > > Is this more acceptible?
> >
> > Let's say it's more acceptable.
> >
> > Acked-by: Dmitry Vyukov <dvyukov@google.com>
>
> Thanks, I'll go make it happen.
>
> > I guess there is no ideal solution here.
> >
> > Just a straw man proposal: expected number of elements is large enough
> > to make bsearch profitable, right? I see 1 is a common case, but the
> > other case has multiple entries.
>
> Latency was the consideration; the linear search would dramatically
> increase the runtime of the exception.
>
> The current limit is 256 entries and we're hitting that quite often.
>
> (we can trivially increase, but nobody has been able to show significant
> benefits for that -- as of yet)

I see. Thanks for explaining. Just wanted to check because inlining a
linear search would free us from all these unpleasant problems.

^ permalink raw reply	[flat|nested] 99+ messages in thread

end of thread, other threads:[~2020-02-20 16:22 UTC | newest]

Thread overview: 99+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-19 14:47 [PATCH v3 00/22] tracing vs world Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 01/22] hardirq/nmi: Allow nested nmi_enter() Peter Zijlstra
2020-02-19 15:31   ` Steven Rostedt
2020-02-19 16:56     ` Borislav Petkov
2020-02-19 17:07       ` Peter Zijlstra
2020-02-20  8:41   ` Will Deacon
2020-02-20  9:19   ` Marc Zyngier
2020-02-20 13:18   ` Petr Mladek
2020-02-19 14:47 ` [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic() Peter Zijlstra
2020-02-19 17:13   ` Borislav Petkov
2020-02-19 17:21     ` Andy Lutomirski
2020-02-19 17:33       ` Peter Zijlstra
2020-02-19 22:12         ` Andy Lutomirski
2020-02-19 22:33           ` Luck, Tony
2020-02-19 22:48             ` Andy Lutomirski
2020-02-20  7:39           ` Peter Zijlstra
2020-02-19 17:42       ` Borislav Petkov
2020-02-19 17:46         ` Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 03/22] x86: Replace ist_enter() with nmi_enter() Peter Zijlstra
2020-02-20 10:54   ` Borislav Petkov
2020-02-20 12:11     ` Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 04/22] x86/doublefault: Make memmove() notrace/NOKPROBE Peter Zijlstra
2020-02-19 15:36   ` Steven Rostedt
2020-02-19 15:40     ` Peter Zijlstra
2020-02-19 15:55       ` Steven Rostedt
2020-02-19 15:57       ` Peter Zijlstra
2020-02-19 16:04         ` Peter Zijlstra
2020-02-19 16:12           ` Steven Rostedt
2020-02-19 16:27             ` Paul E. McKenney
2020-02-19 16:34               ` Peter Zijlstra
2020-02-19 16:46                 ` Paul E. McKenney
2020-02-19 17:05               ` Steven Rostedt
2020-02-20 12:17         ` Borislav Petkov
2020-02-20 12:37           ` Peter Zijlstra
2020-02-19 15:47   ` Steven Rostedt
2020-02-19 14:47 ` [PATCH v3 05/22] rcu: Make RCU IRQ enter/exit functions rely on in_nmi() Peter Zijlstra
2020-02-19 16:31   ` Paul E. McKenney
2020-02-19 16:37     ` Peter Zijlstra
2020-02-19 16:45       ` Paul E. McKenney
2020-02-19 17:03       ` Peter Zijlstra
2020-02-19 17:42         ` Paul E. McKenney
2020-02-19 17:16     ` [PATCH] rcu/kprobes: Comment why rcu_nmi_enter() is marked NOKPROBE Steven Rostedt
2020-02-19 17:18       ` Joel Fernandes
2020-02-19 17:41       ` Paul E. McKenney
2020-02-20  5:54       ` Masami Hiramatsu
2020-02-19 14:47 ` [PATCH v3 06/22] rcu: Rename rcu_irq_{enter,exit}_irqson() Peter Zijlstra
2020-02-19 16:38   ` Paul E. McKenney
2020-02-19 14:47 ` [PATCH v3 07/22] rcu: Mark rcu_dynticks_curr_cpu_in_eqs() inline Peter Zijlstra
2020-02-19 16:39   ` Paul E. McKenney
2020-02-19 17:19     ` Steven Rostedt
2020-02-19 14:47 ` [PATCH v3 08/22] rcu,tracing: Create trace_rcu_{enter,exit}() Peter Zijlstra
2020-02-19 15:49   ` Steven Rostedt
2020-02-19 15:58     ` Peter Zijlstra
2020-02-19 16:15       ` Steven Rostedt
2020-02-19 16:35         ` Peter Zijlstra
2020-02-19 16:44           ` Paul E. McKenney
2020-02-20 10:34             ` Peter Zijlstra
2020-02-20 13:58               ` Paul E. McKenney
2020-02-19 14:47 ` [PATCH v3 09/22] sched,rcu,tracing: Avoid tracing before in_nmi() is correct Peter Zijlstra
2020-02-19 15:50   ` Steven Rostedt
2020-02-19 15:50   ` Steven Rostedt
2020-02-19 14:47 ` [PATCH v3 10/22] x86,tracing: Add comments to do_nmi() Peter Zijlstra
2020-02-19 15:51   ` Steven Rostedt
2020-02-19 14:47 ` [PATCH v3 11/22] perf,tracing: Prepare the perf-trace interface for RCU changes Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 12/22] tracing: Employ trace_rcu_{enter,exit}() Peter Zijlstra
2020-02-19 15:52   ` Steven Rostedt
2020-02-19 14:47 ` [PATCH v3 13/22] tracing: Remove regular RCU context for _rcuidle tracepoints (again) Peter Zijlstra
2020-02-19 15:53   ` Steven Rostedt
2020-02-19 16:43   ` Paul E. McKenney
2020-02-19 16:47     ` Peter Zijlstra
2020-02-19 17:05       ` Peter Zijlstra
2020-02-19 17:21         ` Steven Rostedt
2020-02-19 17:40           ` Paul E. McKenney
2020-02-19 18:00             ` Steven Rostedt
2020-02-19 19:05               ` Paul E. McKenney
2020-02-19 14:47 ` [PATCH v3 14/22] perf,tracing: Allow function tracing when !RCU Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 15/22] x86/int3: Ensure that poke_int3_handler() is not traced Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 16/22] locking/atomics, kcsan: Add KCSAN instrumentation Peter Zijlstra
2020-02-19 15:46   ` Steven Rostedt
2020-02-19 16:03     ` Peter Zijlstra
2020-02-19 16:50       ` Paul E. McKenney
2020-02-19 16:54         ` Peter Zijlstra
2020-02-19 17:36           ` Paul E. McKenney
2020-02-19 14:47 ` [PATCH v3 17/22] asm-generic/atomic: Use __always_inline for pure wrappers Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 18/22] asm-generic/atomic: Use __always_inline for fallback wrappers Peter Zijlstra
2020-02-19 16:55   ` Paul E. McKenney
2020-02-19 17:06     ` Peter Zijlstra
2020-02-19 17:35       ` Paul E. McKenney
2020-02-19 14:47 ` [PATCH v3 19/22] compiler: Simple READ/WRITE_ONCE() implementations Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 20/22] locking/atomics: Flip fallbacks and instrumentation Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 21/22] x86/int3: Avoid atomic instrumentation Peter Zijlstra
2020-02-19 14:47 ` [PATCH v3 22/22] x86/int3: Ensure that poke_int3_handler() is not sanitized Peter Zijlstra
2020-02-19 16:06   ` Dmitry Vyukov
2020-02-19 16:30     ` Peter Zijlstra
2020-02-19 16:51       ` Peter Zijlstra
2020-02-19 17:20       ` Peter Zijlstra
2020-02-20 10:37         ` Dmitry Vyukov
2020-02-20 12:06           ` Peter Zijlstra
2020-02-20 16:22             ` Dmitry Vyukov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.