linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT pull] irq/urgent for v5.10-rc6
@ 2020-11-29 13:37 Thomas Gleixner
  2020-11-29 13:38 ` [GIT pull] locking/urgent " Thomas Gleixner
  2020-11-29 20:06 ` [GIT pull] irq/urgent " pr-tracker-bot
  0 siblings, 2 replies; 38+ messages in thread
From: Thomas Gleixner @ 2020-11-29 13:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, x86

Linus,

please pull the latest irq/urgent branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq-urgent-2020-11-29

up to:  7032908cd584: Merge tag 'irqchip-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent


Two fixes for irqchip drivers:

  - Save and restore the GICV3 ITS state unconditionally on suspend/resume
    to handle firmware which fails to do so.

  - Use the correct index into the fwspec parameters to read the irq
    trigger type in the EXIU chip driver.

Thanks,

	tglx

------------------>
Chen Baozi (1):
      irqchip/exiu: Fix the index of fwspec for IRQ type

Xu Qiang (1):
      irqchip/gic-v3-its: Unconditionally save/restore the ITS state on suspend


 drivers/irqchip/irq-gic-v3-its.c | 16 +++-------------
 drivers/irqchip/irq-sni-exiu.c   |  2 +-
 2 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 0fec31931e11..4069c215328b 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -42,7 +42,6 @@
 #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING		(1ULL << 0)
 #define ITS_FLAGS_WORKAROUND_CAVIUM_22375	(1ULL << 1)
 #define ITS_FLAGS_WORKAROUND_CAVIUM_23144	(1ULL << 2)
-#define ITS_FLAGS_SAVE_SUSPEND_STATE		(1ULL << 3)
 
 #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING	(1 << 0)
 #define RDIST_FLAGS_RD_TABLES_PREALLOCATED	(1 << 1)
@@ -4741,9 +4740,6 @@ static int its_save_disable(void)
 	list_for_each_entry(its, &its_nodes, entry) {
 		void __iomem *base;
 
-		if (!(its->flags & ITS_FLAGS_SAVE_SUSPEND_STATE))
-			continue;
-
 		base = its->base;
 		its->ctlr_save = readl_relaxed(base + GITS_CTLR);
 		err = its_force_quiescent(base);
@@ -4762,9 +4758,6 @@ static int its_save_disable(void)
 		list_for_each_entry_continue_reverse(its, &its_nodes, entry) {
 			void __iomem *base;
 
-			if (!(its->flags & ITS_FLAGS_SAVE_SUSPEND_STATE))
-				continue;
-
 			base = its->base;
 			writel_relaxed(its->ctlr_save, base + GITS_CTLR);
 		}
@@ -4784,9 +4777,6 @@ static void its_restore_enable(void)
 		void __iomem *base;
 		int i;
 
-		if (!(its->flags & ITS_FLAGS_SAVE_SUSPEND_STATE))
-			continue;
-
 		base = its->base;
 
 		/*
@@ -4794,7 +4784,10 @@ static void its_restore_enable(void)
 		 * don't restore it since writing to CBASER or BASER<n>
 		 * registers is undefined according to the GIC v3 ITS
 		 * Specification.
+		 *
+		 * Firmware resuming with the ITS enabled is terminally broken.
 		 */
+		WARN_ON(readl_relaxed(base + GITS_CTLR) & GITS_CTLR_ENABLE);
 		ret = its_force_quiescent(base);
 		if (ret) {
 			pr_err("ITS@%pa: failed to quiesce on resume: %d\n",
@@ -5074,9 +5067,6 @@ static int __init its_probe_one(struct resource *res,
 		ctlr |= GITS_CTLR_ImDe;
 	writel_relaxed(ctlr, its->base + GITS_CTLR);
 
-	if (GITS_TYPER_HCC(typer))
-		its->flags |= ITS_FLAGS_SAVE_SUSPEND_STATE;
-
 	err = its_init_domain(handle, its);
 	if (err)
 		goto out_free_tables;
diff --git a/drivers/irqchip/irq-sni-exiu.c b/drivers/irqchip/irq-sni-exiu.c
index 1d027623c776..abd011fcecf4 100644
--- a/drivers/irqchip/irq-sni-exiu.c
+++ b/drivers/irqchip/irq-sni-exiu.c
@@ -136,7 +136,7 @@ static int exiu_domain_translate(struct irq_domain *domain,
 		if (fwspec->param_count != 2)
 			return -EINVAL;
 		*hwirq = fwspec->param[0];
-		*type = fwspec->param[2] & IRQ_TYPE_SENSE_MASK;
+		*type = fwspec->param[1] & IRQ_TYPE_SENSE_MASK;
 	}
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [GIT pull] locking/urgent for v5.10-rc6
  2020-11-29 13:37 [GIT pull] irq/urgent for v5.10-rc6 Thomas Gleixner
@ 2020-11-29 13:38 ` Thomas Gleixner
  2020-11-29 19:31   ` Linus Torvalds
  2020-11-29 20:06   ` pr-tracker-bot
  2020-11-29 20:06 ` [GIT pull] irq/urgent " pr-tracker-bot
  1 sibling, 2 replies; 38+ messages in thread
From: Thomas Gleixner @ 2020-11-29 13:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, x86

Linus,

please pull the latest locking/urgent branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking-urgent-2020-11-29

up to:  6e1d2bc675bd: intel_idle: Fix intel_idle() vs tracing


Yet two more places which invoke tracing from RCU disabled regions in the
idle path. Similar to the entry path the low level idle functions have to
be non-instrumentable.

Thanks,

	tglx

------------------>
Peter Zijlstra (2):
      sched/idle: Fix arch_cpu_idle() vs tracing
      intel_idle: Fix intel_idle() vs tracing


 arch/alpha/kernel/process.c      |  2 +-
 arch/arm/kernel/process.c        |  2 +-
 arch/arm64/kernel/process.c      |  2 +-
 arch/csky/kernel/process.c       |  2 +-
 arch/h8300/kernel/process.c      |  2 +-
 arch/hexagon/kernel/process.c    |  2 +-
 arch/ia64/kernel/process.c       |  2 +-
 arch/microblaze/kernel/process.c |  2 +-
 arch/mips/kernel/idle.c          | 12 ++++++------
 arch/nios2/kernel/process.c      |  2 +-
 arch/openrisc/kernel/process.c   |  2 +-
 arch/parisc/kernel/process.c     |  2 +-
 arch/powerpc/kernel/idle.c       |  4 ++--
 arch/riscv/kernel/process.c      |  2 +-
 arch/s390/kernel/idle.c          |  6 +++---
 arch/sh/kernel/idle.c            |  2 +-
 arch/sparc/kernel/leon_pmc.c     |  4 ++--
 arch/sparc/kernel/process_32.c   |  2 +-
 arch/sparc/kernel/process_64.c   |  4 ++--
 arch/um/kernel/process.c         |  2 +-
 arch/x86/include/asm/mwait.h     |  2 --
 arch/x86/kernel/process.c        | 12 +++++++-----
 drivers/idle/intel_idle.c        | 37 ++++++++++++++++++++-----------------
 kernel/sched/idle.c              | 28 +++++++++++++++++++++++++++-
 24 files changed, 84 insertions(+), 55 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 7462a7911002..4c7b0414a3ff 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -57,7 +57,7 @@ EXPORT_SYMBOL(pm_power_off);
 void arch_cpu_idle(void)
 {
 	wtint(0);
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 void arch_cpu_idle_dead(void)
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 8e6ace03e960..9f199b1e8383 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -71,7 +71,7 @@ void arch_cpu_idle(void)
 		arm_pm_idle();
 	else
 		cpu_do_idle();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 void arch_cpu_idle_prepare(void)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 4784011cecac..9ebe02574127 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -126,7 +126,7 @@ void arch_cpu_idle(void)
 	 * tricks
 	 */
 	cpu_do_idle();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/arch/csky/kernel/process.c b/arch/csky/kernel/process.c
index f730869e21ee..69af6bc87e64 100644
--- a/arch/csky/kernel/process.c
+++ b/arch/csky/kernel/process.c
@@ -102,6 +102,6 @@ void arch_cpu_idle(void)
 #ifdef CONFIG_CPU_PM_STOP
 	asm volatile("stop\n");
 #endif
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 #endif
diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c
index aea0a40b77a9..bc1364db58fe 100644
--- a/arch/h8300/kernel/process.c
+++ b/arch/h8300/kernel/process.c
@@ -57,7 +57,7 @@ asmlinkage void ret_from_kernel_thread(void);
  */
 void arch_cpu_idle(void)
 {
-	local_irq_enable();
+	raw_local_irq_enable();
 	__asm__("sleep");
 }
 
diff --git a/arch/hexagon/kernel/process.c b/arch/hexagon/kernel/process.c
index 5a0a95d93ddb..67767c5ed98c 100644
--- a/arch/hexagon/kernel/process.c
+++ b/arch/hexagon/kernel/process.c
@@ -44,7 +44,7 @@ void arch_cpu_idle(void)
 {
 	__vmwait();
 	/*  interrupts wake us up, but irqs are still disabled */
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 /*
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index 6b61a703bcf5..c9ff8796b509 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -239,7 +239,7 @@ void arch_cpu_idle(void)
 	if (mark_idle)
 		(*mark_idle)(1);
 
-	safe_halt();
+	raw_safe_halt();
 
 	if (mark_idle)
 		(*mark_idle)(0);
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index a9e46e525cd0..f99860771ff4 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -149,5 +149,5 @@ int dump_fpu(struct pt_regs *regs, elf_fpregset_t *fpregs)
 
 void arch_cpu_idle(void)
 {
-       local_irq_enable();
+       raw_local_irq_enable();
 }
diff --git a/arch/mips/kernel/idle.c b/arch/mips/kernel/idle.c
index 5bc3b04693c7..18e69ebf5691 100644
--- a/arch/mips/kernel/idle.c
+++ b/arch/mips/kernel/idle.c
@@ -33,19 +33,19 @@ static void __cpuidle r3081_wait(void)
 {
 	unsigned long cfg = read_c0_conf();
 	write_c0_conf(cfg | R30XX_CONF_HALT);
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 static void __cpuidle r39xx_wait(void)
 {
 	if (!need_resched())
 		write_c0_conf(read_c0_conf() | TX39_CONF_HALT);
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 void __cpuidle r4k_wait(void)
 {
-	local_irq_enable();
+	raw_local_irq_enable();
 	__r4k_wait();
 }
 
@@ -64,7 +64,7 @@ void __cpuidle r4k_wait_irqoff(void)
 		"	.set	arch=r4000	\n"
 		"	wait			\n"
 		"	.set	pop		\n");
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 /*
@@ -84,7 +84,7 @@ static void __cpuidle rm7k_wait_irqoff(void)
 		"	wait						\n"
 		"	mtc0	$1, $12		# stalls until W stage	\n"
 		"	.set	pop					\n");
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 /*
@@ -257,7 +257,7 @@ void arch_cpu_idle(void)
 	if (cpu_wait)
 		cpu_wait();
 	else
-		local_irq_enable();
+		raw_local_irq_enable();
 }
 
 #ifdef CONFIG_CPU_IDLE
diff --git a/arch/nios2/kernel/process.c b/arch/nios2/kernel/process.c
index 4ffe857e6ada..50b4eb19a6cc 100644
--- a/arch/nios2/kernel/process.c
+++ b/arch/nios2/kernel/process.c
@@ -33,7 +33,7 @@ EXPORT_SYMBOL(pm_power_off);
 
 void arch_cpu_idle(void)
 {
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 /*
diff --git a/arch/openrisc/kernel/process.c b/arch/openrisc/kernel/process.c
index 0ff391f00334..3c98728cce24 100644
--- a/arch/openrisc/kernel/process.c
+++ b/arch/openrisc/kernel/process.c
@@ -79,7 +79,7 @@ void machine_power_off(void)
  */
 void arch_cpu_idle(void)
 {
-	local_irq_enable();
+	raw_local_irq_enable();
 	if (mfspr(SPR_UPR) & SPR_UPR_PMP)
 		mtspr(SPR_PMR, mfspr(SPR_PMR) | SPR_PMR_DME);
 }
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index f196d96e2f9f..a92a23d6acd9 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -169,7 +169,7 @@ void __cpuidle arch_cpu_idle_dead(void)
 
 void __cpuidle arch_cpu_idle(void)
 {
-	local_irq_enable();
+	raw_local_irq_enable();
 
 	/* nop on real hardware, qemu will idle sleep. */
 	asm volatile("or %%r10,%%r10,%%r10\n":::);
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index ae0e2632393d..1f835539fda4 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -52,9 +52,9 @@ void arch_cpu_idle(void)
 		 * interrupts enabled, some don't.
 		 */
 		if (irqs_disabled())
-			local_irq_enable();
+			raw_local_irq_enable();
 	} else {
-		local_irq_enable();
+		raw_local_irq_enable();
 		/*
 		 * Go into low thread priority and possibly
 		 * low power mode.
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 19225ec65db6..dd5f985b1f40 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -36,7 +36,7 @@ extern asmlinkage void ret_from_kernel_thread(void);
 void arch_cpu_idle(void)
 {
 	wait_for_interrupt();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 void show_regs(struct pt_regs *regs)
diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c
index f7f1e64e0d98..2b85096964f8 100644
--- a/arch/s390/kernel/idle.c
+++ b/arch/s390/kernel/idle.c
@@ -33,10 +33,10 @@ void enabled_wait(void)
 		PSW_MASK_IO | PSW_MASK_EXT | PSW_MASK_MCHECK;
 	clear_cpu_flag(CIF_NOHZ_DELAY);
 
-	local_irq_save(flags);
+	raw_local_irq_save(flags);
 	/* Call the assembler magic in entry.S */
 	psw_idle(idle, psw_mask);
-	local_irq_restore(flags);
+	raw_local_irq_restore(flags);
 
 	/* Account time spent with enabled wait psw loaded as idle time. */
 	raw_write_seqcount_begin(&idle->seqcount);
@@ -123,7 +123,7 @@ void arch_cpu_idle_enter(void)
 void arch_cpu_idle(void)
 {
 	enabled_wait();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 void arch_cpu_idle_exit(void)
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index 0dc0f52f9bb8..f59814983bd5 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -22,7 +22,7 @@ static void (*sh_idle)(void);
 void default_idle(void)
 {
 	set_bl_bit();
-	local_irq_enable();
+	raw_local_irq_enable();
 	/* Isn't this racy ? */
 	cpu_sleep();
 	clear_bl_bit();
diff --git a/arch/sparc/kernel/leon_pmc.c b/arch/sparc/kernel/leon_pmc.c
index 065e2d4b7290..396f46bca52e 100644
--- a/arch/sparc/kernel/leon_pmc.c
+++ b/arch/sparc/kernel/leon_pmc.c
@@ -50,7 +50,7 @@ static void pmc_leon_idle_fixup(void)
 	register unsigned int address = (unsigned int)leon3_irqctrl_regs;
 
 	/* Interrupts need to be enabled to not hang the CPU */
-	local_irq_enable();
+	raw_local_irq_enable();
 
 	__asm__ __volatile__ (
 		"wr	%%g0, %%asr19\n"
@@ -66,7 +66,7 @@ static void pmc_leon_idle_fixup(void)
 static void pmc_leon_idle(void)
 {
 	/* Interrupts need to be enabled to not hang the CPU */
-	local_irq_enable();
+	raw_local_irq_enable();
 
 	/* For systems without power-down, this will be no-op */
 	__asm__ __volatile__ ("wr	%g0, %asr19\n\t");
diff --git a/arch/sparc/kernel/process_32.c b/arch/sparc/kernel/process_32.c
index adfcaeab3ddc..a02363735915 100644
--- a/arch/sparc/kernel/process_32.c
+++ b/arch/sparc/kernel/process_32.c
@@ -74,7 +74,7 @@ void arch_cpu_idle(void)
 {
 	if (sparc_idle)
 		(*sparc_idle)();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 /* XXX cli/sti -> local_irq_xxx here, check this works once SMP is fixed. */
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index a75093b993f9..6f8c7822fc06 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -62,11 +62,11 @@ void arch_cpu_idle(void)
 {
 	if (tlb_type != hypervisor) {
 		touch_nmi_watchdog();
-		local_irq_enable();
+		raw_local_irq_enable();
 	} else {
 		unsigned long pstate;
 
-		local_irq_enable();
+		raw_local_irq_enable();
 
                 /* The sun4v sleeping code requires that we have PSTATE.IE cleared over
                  * the cpu sleep hypervisor call.
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 3bed09538dd9..9505a7e87396 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -217,7 +217,7 @@ void arch_cpu_idle(void)
 {
 	cpu_tasks[current_thread_info()->cpu].pid = os_getpid();
 	um_idle_sleep();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 int __cant_sleep(void) {
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index e039a933aca3..29dd27b5a339 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -88,8 +88,6 @@ static inline void __mwaitx(unsigned long eax, unsigned long ebx,
 
 static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 {
-	trace_hardirqs_on();
-
 	mds_idle_clear_cpu_buffers();
 	/* "mwait %eax, %ecx;" */
 	asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ba4593a913fa..145a7ac0c19a 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -685,7 +685,7 @@ void arch_cpu_idle(void)
  */
 void __cpuidle default_idle(void)
 {
-	safe_halt();
+	raw_safe_halt();
 }
 #if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
 EXPORT_SYMBOL(default_idle);
@@ -736,6 +736,8 @@ void stop_this_cpu(void *dummy)
 /*
  * AMD Erratum 400 aware idle routine. We handle it the same way as C3 power
  * states (local apic timer and TSC stop).
+ *
+ * XXX this function is completely buggered vs RCU and tracing.
  */
 static void amd_e400_idle(void)
 {
@@ -757,9 +759,9 @@ static void amd_e400_idle(void)
 	 * The switch back from broadcast mode needs to be called with
 	 * interrupts disabled.
 	 */
-	local_irq_disable();
+	raw_local_irq_disable();
 	tick_broadcast_exit();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 /*
@@ -801,9 +803,9 @@ static __cpuidle void mwait_idle(void)
 		if (!need_resched())
 			__sti_mwait(0, 0);
 		else
-			local_irq_enable();
+			raw_local_irq_enable();
 	} else {
-		local_irq_enable();
+		raw_local_irq_enable();
 	}
 	__current_clr_polling();
 }
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 01bace49a962..7ee7ffe22ae3 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -126,26 +126,9 @@ static __cpuidle int intel_idle(struct cpuidle_device *dev,
 	struct cpuidle_state *state = &drv->states[index];
 	unsigned long eax = flg2MWAIT(state->flags);
 	unsigned long ecx = 1; /* break on interrupt flag */
-	bool tick;
-
-	if (!static_cpu_has(X86_FEATURE_ARAT)) {
-		/*
-		 * Switch over to one-shot tick broadcast if the target C-state
-		 * is deeper than C1.
-		 */
-		if ((eax >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) {
-			tick = true;
-			tick_broadcast_enter();
-		} else {
-			tick = false;
-		}
-	}
 
 	mwait_idle_with_hints(eax, ecx);
 
-	if (!static_cpu_has(X86_FEATURE_ARAT) && tick)
-		tick_broadcast_exit();
-
 	return index;
 }
 
@@ -1227,6 +1210,20 @@ static bool __init intel_idle_acpi_cst_extract(void)
 	return false;
 }
 
+static bool __init intel_idle_state_needs_timer_stop(struct cpuidle_state *state)
+{
+	unsigned long eax = flg2MWAIT(state->flags);
+
+	if (boot_cpu_has(X86_FEATURE_ARAT))
+		return false;
+
+	/*
+	 * Switch over to one-shot tick broadcast if the target C-state
+	 * is deeper than C1.
+	 */
+	return !!((eax >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK);
+}
+
 static void __init intel_idle_init_cstates_acpi(struct cpuidle_driver *drv)
 {
 	int cstate, limit = min_t(int, CPUIDLE_STATE_MAX, acpi_state_table.count);
@@ -1269,6 +1266,9 @@ static void __init intel_idle_init_cstates_acpi(struct cpuidle_driver *drv)
 		if (disabled_states_mask & BIT(cstate))
 			state->flags |= CPUIDLE_FLAG_OFF;
 
+		if (intel_idle_state_needs_timer_stop(state))
+			state->flags |= CPUIDLE_FLAG_TIMER_STOP;
+
 		state->enter = intel_idle;
 		state->enter_s2idle = intel_idle_s2idle;
 	}
@@ -1507,6 +1507,9 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
 		     !(cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_ALWAYS_ENABLE)))
 			drv->states[drv->state_count].flags |= CPUIDLE_FLAG_OFF;
 
+		if (intel_idle_state_needs_timer_stop(&drv->states[drv->state_count]))
+			drv->states[drv->state_count].flags |= CPUIDLE_FLAG_TIMER_STOP;
+
 		drv->state_count++;
 	}
 
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 24d0ee26377d..c6932b8f4467 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -78,7 +78,7 @@ void __weak arch_cpu_idle_dead(void) { }
 void __weak arch_cpu_idle(void)
 {
 	cpu_idle_force_poll = 1;
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 /**
@@ -94,9 +94,35 @@ void __cpuidle default_idle_call(void)
 
 		trace_cpu_idle(1, smp_processor_id());
 		stop_critical_timings();
+
+		/*
+		 * arch_cpu_idle() is supposed to enable IRQs, however
+		 * we can't do that because of RCU and tracing.
+		 *
+		 * Trace IRQs enable here, then switch off RCU, and have
+		 * arch_cpu_idle() use raw_local_irq_enable(). Note that
+		 * rcu_idle_enter() relies on lockdep IRQ state, so switch that
+		 * last -- this is very similar to the entry code.
+		 */
+		trace_hardirqs_on_prepare();
+		lockdep_hardirqs_on_prepare(_THIS_IP_);
 		rcu_idle_enter();
+		lockdep_hardirqs_on(_THIS_IP_);
+
 		arch_cpu_idle();
+
+		/*
+		 * OK, so IRQs are enabled here, but RCU needs them disabled to
+		 * turn itself back on.. funny thing is that disabling IRQs
+		 * will cause tracing, which needs RCU. Jump through hoops to
+		 * make it 'work'.
+		 */
+		raw_local_irq_disable();
+		lockdep_hardirqs_off(_THIS_IP_);
 		rcu_idle_exit();
+		lockdep_hardirqs_on(_THIS_IP_);
+		raw_local_irq_enable();
+
 		start_critical_timings();
 		trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 	}


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-29 13:38 ` [GIT pull] locking/urgent " Thomas Gleixner
@ 2020-11-29 19:31   ` Linus Torvalds
  2020-11-30  0:29     ` Paul E. McKenney
  2020-11-30  7:56     ` Peter Zijlstra
  2020-11-29 20:06   ` pr-tracker-bot
  1 sibling, 2 replies; 38+ messages in thread
From: Linus Torvalds @ 2020-11-29 19:31 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra, Paul E. McKenney
  Cc: Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Nov 29, 2020 at 5:38 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Yet two more places which invoke tracing from RCU disabled regions in the
> idle path. Similar to the entry path the low level idle functions have to
> be non-instrumentable.

This really seems less than optimal.

In particular, lookie here:

> @@ -94,9 +94,35 @@ void __cpuidle default_idle_call(void)
>
>                 trace_cpu_idle(1, smp_processor_id());
>                 stop_critical_timings();
> +
> +               /*
> +                * arch_cpu_idle() is supposed to enable IRQs, however
> +                * we can't do that because of RCU and tracing.
> +                *
> +                * Trace IRQs enable here, then switch off RCU, and have
> +                * arch_cpu_idle() use raw_local_irq_enable(). Note that
> +                * rcu_idle_enter() relies on lockdep IRQ state, so switch that
> +                * last -- this is very similar to the entry code.
> +                */
> +               trace_hardirqs_on_prepare();
> +               lockdep_hardirqs_on_prepare(_THIS_IP_);
>                 rcu_idle_enter();
> +               lockdep_hardirqs_on(_THIS_IP_);
> +
>                 arch_cpu_idle();
> +
> +               /*
> +                * OK, so IRQs are enabled here, but RCU needs them disabled to
> +                * turn itself back on.. funny thing is that disabling IRQs
> +                * will cause tracing, which needs RCU. Jump through hoops to
> +                * make it 'work'.
> +                */
> +               raw_local_irq_disable();
> +               lockdep_hardirqs_off(_THIS_IP_);
>                 rcu_idle_exit();
> +               lockdep_hardirqs_on(_THIS_IP_);
> +               raw_local_irq_enable();
> +
>                 start_critical_timings();
>                 trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
>         }

And look at what the code generation for the idle exit path is when
lockdep isn't even on.

It's *literally*

        cli
        call rcu_idle_exit
        sti

and guess what rcu_idle_exit does?

Yeah, that one does "pushf; cli; call rcu_eqs_exit; popf".

So here we are, in the somewhat critical "an interrupt woke us up"
section, and we're doing just ridiculously stupid things.

I've pulled this, because it solves a problem, but there's a deeper
problem here in how all this is done.

The idle path is actually quite important. I can point to real loads
where this is a big part of the CPU profile, because you end up having
lots of "go to sleep for very short times, because the thing we were
waiting for takes almost no time at all".

                 Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] irq/urgent for v5.10-rc6
  2020-11-29 13:37 [GIT pull] irq/urgent for v5.10-rc6 Thomas Gleixner
  2020-11-29 13:38 ` [GIT pull] locking/urgent " Thomas Gleixner
@ 2020-11-29 20:06 ` pr-tracker-bot
  1 sibling, 0 replies; 38+ messages in thread
From: pr-tracker-bot @ 2020-11-29 20:06 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, linux-kernel, x86

The pull request you sent on Sun, 29 Nov 2020 13:37:59 -0000:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq-urgent-2020-11-29

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/8b7a51ba2637ee53ce90624f5f98aaf8ec9b2bcc

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-29 13:38 ` [GIT pull] locking/urgent " Thomas Gleixner
  2020-11-29 19:31   ` Linus Torvalds
@ 2020-11-29 20:06   ` pr-tracker-bot
  1 sibling, 0 replies; 38+ messages in thread
From: pr-tracker-bot @ 2020-11-29 20:06 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, linux-kernel, x86

The pull request you sent on Sun, 29 Nov 2020 13:38:00 -0000:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking-urgent-2020-11-29

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/f91a3aa6bce480fe6e08df540129f4a923222419

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-29 19:31   ` Linus Torvalds
@ 2020-11-30  0:29     ` Paul E. McKenney
  2020-11-30  7:56     ` Peter Zijlstra
  1 sibling, 0 replies; 38+ messages in thread
From: Paul E. McKenney @ 2020-11-30  0:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Sun, Nov 29, 2020 at 11:31:41AM -0800, Linus Torvalds wrote:
> On Sun, Nov 29, 2020 at 5:38 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Yet two more places which invoke tracing from RCU disabled regions in the
> > idle path. Similar to the entry path the low level idle functions have to
> > be non-instrumentable.
> 
> This really seems less than optimal.
> 
> In particular, lookie here:
> 
> > @@ -94,9 +94,35 @@ void __cpuidle default_idle_call(void)
> >
> >                 trace_cpu_idle(1, smp_processor_id());
> >                 stop_critical_timings();
> > +
> > +               /*
> > +                * arch_cpu_idle() is supposed to enable IRQs, however
> > +                * we can't do that because of RCU and tracing.
> > +                *
> > +                * Trace IRQs enable here, then switch off RCU, and have
> > +                * arch_cpu_idle() use raw_local_irq_enable(). Note that
> > +                * rcu_idle_enter() relies on lockdep IRQ state, so switch that
> > +                * last -- this is very similar to the entry code.
> > +                */
> > +               trace_hardirqs_on_prepare();
> > +               lockdep_hardirqs_on_prepare(_THIS_IP_);
> >                 rcu_idle_enter();
> > +               lockdep_hardirqs_on(_THIS_IP_);
> > +
> >                 arch_cpu_idle();
> > +
> > +               /*
> > +                * OK, so IRQs are enabled here, but RCU needs them disabled to
> > +                * turn itself back on.. funny thing is that disabling IRQs
> > +                * will cause tracing, which needs RCU. Jump through hoops to
> > +                * make it 'work'.
> > +                */
> > +               raw_local_irq_disable();
> > +               lockdep_hardirqs_off(_THIS_IP_);
> >                 rcu_idle_exit();
> > +               lockdep_hardirqs_on(_THIS_IP_);
> > +               raw_local_irq_enable();
> > +
> >                 start_critical_timings();
> >                 trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
> >         }
> 
> And look at what the code generation for the idle exit path is when
> lockdep isn't even on.
> 
> It's *literally*
> 
>         cli
>         call rcu_idle_exit
>         sti
> 
> and guess what rcu_idle_exit does?
> 
> Yeah, that one does "pushf; cli; call rcu_eqs_exit; popf".
> 
> So here we are, in the somewhat critical "an interrupt woke us up"
> section, and we're doing just ridiculously stupid things.
> 
> I've pulled this, because it solves a problem, but there's a deeper
> problem here in how all this is done.
> 
> The idle path is actually quite important. I can point to real loads
> where this is a big part of the CPU profile, because you end up having
> lots of "go to sleep for very short times, because the thing we were
> waiting for takes almost no time at all".

This is because of the noinline implied by the noinstr on rcu_eqs_exit().
If I replace that with inline, it does get inlined.  Except that, if
I remember correctly, making that change messes up the tooling that
enforces the no-instrumentation regions.

I -think- that a combination of instrumentation_end() and s/noinstr/inline/
might work, but I will need to consult with the experts on this.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-29 19:31   ` Linus Torvalds
  2020-11-30  0:29     ` Paul E. McKenney
@ 2020-11-30  7:56     ` Peter Zijlstra
  2020-11-30  8:23       ` Sven Schnelle
                         ` (2 more replies)
  1 sibling, 3 replies; 38+ messages in thread
From: Peter Zijlstra @ 2020-11-30  7:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Paul E. McKenney, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Sun, Nov 29, 2020 at 11:31:41AM -0800, Linus Torvalds wrote:
> On Sun, Nov 29, 2020 at 5:38 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Yet two more places which invoke tracing from RCU disabled regions in the
> > idle path. Similar to the entry path the low level idle functions have to
> > be non-instrumentable.
> 
> This really seems less than optimal.
> 
> In particular, lookie here:
> 
> > @@ -94,9 +94,35 @@ void __cpuidle default_idle_call(void)
> >
> >                 trace_cpu_idle(1, smp_processor_id());
> >                 stop_critical_timings();
> > +
> > +               /*
> > +                * arch_cpu_idle() is supposed to enable IRQs, however
> > +                * we can't do that because of RCU and tracing.
> > +                *
> > +                * Trace IRQs enable here, then switch off RCU, and have
> > +                * arch_cpu_idle() use raw_local_irq_enable(). Note that
> > +                * rcu_idle_enter() relies on lockdep IRQ state, so switch that
> > +                * last -- this is very similar to the entry code.
> > +                */
> > +               trace_hardirqs_on_prepare();
> > +               lockdep_hardirqs_on_prepare(_THIS_IP_);
> >                 rcu_idle_enter();
> > +               lockdep_hardirqs_on(_THIS_IP_);
> > +
> >                 arch_cpu_idle();
> > +
> > +               /*
> > +                * OK, so IRQs are enabled here, but RCU needs them disabled to
> > +                * turn itself back on.. funny thing is that disabling IRQs
> > +                * will cause tracing, which needs RCU. Jump through hoops to
> > +                * make it 'work'.
> > +                */
> > +               raw_local_irq_disable();
> > +               lockdep_hardirqs_off(_THIS_IP_);
> >                 rcu_idle_exit();
> > +               lockdep_hardirqs_on(_THIS_IP_);
> > +               raw_local_irq_enable();
> > +
> >                 start_critical_timings();
> >                 trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
> >         }
> 
> And look at what the code generation for the idle exit path is when
> lockdep isn't even on.

Agreed.

The idea was to flip all of arch_cpu_idle() to not enable interrupts.

This is suboptimal for things like x86 where arch_cpu_idle() is
basically STI;HLT, but x86 isn't likely to actually use this code path
anyway, given all the various cpuidle drivers it has.

Many of the other archs are now doing things like arm's:
wfi();raw_local_irq_enable().

Doing that tree-wide interrupt-state flip was something I didn't want to
do at this late a stage, the chanse of messing that up is just too high.

After that I need to go look at flipping cpuidle, which is even more
'interesting'. cpuidle_enter() has the exact same semantics, and this is
the code path that x86 actually uses, and here it's inconsitent at best.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-30  7:56     ` Peter Zijlstra
@ 2020-11-30  8:23       ` Sven Schnelle
  2020-11-30 12:31         ` Sven Schnelle
  2020-11-30  8:36       ` Peter Zijlstra
  2020-11-30 17:55       ` Linus Torvalds
  2 siblings, 1 reply; 38+ messages in thread
From: Sven Schnelle @ 2020-11-30  8:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Thomas Gleixner, Paul E. McKenney,
	Linux Kernel Mailing List, the arch/x86 maintainers

Hi Peter,

Peter Zijlstra <peterz@infradead.org> writes:

> On Sun, Nov 29, 2020 at 11:31:41AM -0800, Linus Torvalds wrote:
>> On Sun, Nov 29, 2020 at 5:38 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>> >
>> > Yet two more places which invoke tracing from RCU disabled regions in the
>> > idle path. Similar to the entry path the low level idle functions have to
>> > be non-instrumentable.
>> 
>> This really seems less than optimal.
>> 
>> In particular, lookie here:
>> 
>> > @@ -94,9 +94,35 @@ void __cpuidle default_idle_call(void)
>> >
>> >                 trace_cpu_idle(1, smp_processor_id());
>> >                 stop_critical_timings();
>> > +
>> > +               /*
>> > +                * arch_cpu_idle() is supposed to enable IRQs, however
>> > +                * we can't do that because of RCU and tracing.
>> > +                *
>> > +                * Trace IRQs enable here, then switch off RCU, and have
>> > +                * arch_cpu_idle() use raw_local_irq_enable(). Note that
>> > +                * rcu_idle_enter() relies on lockdep IRQ state, so switch that
>> > +                * last -- this is very similar to the entry code.
>> > +                */
>> > +               trace_hardirqs_on_prepare();
>> > +               lockdep_hardirqs_on_prepare(_THIS_IP_);
>> >                 rcu_idle_enter();
>> > +               lockdep_hardirqs_on(_THIS_IP_);
>> > +
>> >                 arch_cpu_idle();
>> > +
>> > +               /*
>> > +                * OK, so IRQs are enabled here, but RCU needs them disabled to
>> > +                * turn itself back on.. funny thing is that disabling IRQs
>> > +                * will cause tracing, which needs RCU. Jump through hoops to
>> > +                * make it 'work'.
>> > +                */
>> > +               raw_local_irq_disable();
>> > +               lockdep_hardirqs_off(_THIS_IP_);
>> >                 rcu_idle_exit();
>> > +               lockdep_hardirqs_on(_THIS_IP_);
>> > +               raw_local_irq_enable();
>> > +
>> >                 start_critical_timings();
>> >                 trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
>> >         }
>> 
>> And look at what the code generation for the idle exit path is when
>> lockdep isn't even on.
>
> Agreed.
>
> The idea was to flip all of arch_cpu_idle() to not enable interrupts.
>
> This is suboptimal for things like x86 where arch_cpu_idle() is
> basically STI;HLT, but x86 isn't likely to actually use this code path
> anyway, given all the various cpuidle drivers it has.
>
> Many of the other archs are now doing things like arm's:
> wfi();raw_local_irq_enable().
>
> Doing that tree-wide interrupt-state flip was something I didn't want to
> do at this late a stage, the chanse of messing that up is just too high.
>
> After that I need to go look at flipping cpuidle, which is even more
> 'interesting'. cpuidle_enter() has the exact same semantics, and this is
> the code path that x86 actually uses, and here it's inconsitent at best.

On s390 this introduces the following splat:

[    2.962283] Testing tracer wakeup_rt:  
[    3.017102] sched: DL replenish lagged too much 
[    3.061777] PASSED 
[    3.062076] Testing tracer wakeup_dl: PASSED 
[    3.161296] ------------[ cut here ]------------ 
[    3.161301] DEBUG_LOCKS_WARN_ON(current->hardirq_chain_key != current->curr_chain_key) 
[    3.161310] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4155 lockdep_hardirqs_on+0x1ea/0x1f8 
[    3.161316] Modules linked in: 
[    3.161323] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc6-07209-g9e30ba6d2d5c #2249 
[    3.161327] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) 
[    3.161331] Krnl PSW : 0404d00180000000 0000000000d730fe (lockdep_hardirqs_on+0x1ee/0x1f8) 
[    3.161340]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 
[    3.161347] Krnl GPRS: c0000000ffffbfff 0000000080000001 000000000000004a 00000000001e33b8 
[    3.161352]            0000038000007b88 0000038000007b80 ffffffffffffffff 0000000000000000 
[    3.161357]            0000000000000000 000000000151a610 0000000001361500 0000000000d81d40 
[    3.161362]            0000000000010400 0000000000d88010 0000000000d730fa 0000038000007db8
 
[    3.161461] Krnl Code: 0000000000d730ee: c0200012d9ad        larl    %r2,0000000000fce448 
[    3.161461]            0000000000d730f4: c0e5ffff451a        brasl   %r14,0000000000d5bb28 
[    3.161461]           #0000000000d730fa: af000000            mc      0,0 
[    3.161461]           >0000000000d730fe: a7f4ff70            brc     15,0000000000d72fde 
[    3.161461]            0000000000d73102: 0707                bcr     0,%r7 
[    3.161461]            0000000000d73104: 0707                bcr     0,%r7 
[    3.161461]            0000000000d73106: 0707                bcr     0,%r7 
[    3.161461]            0000000000d73108: c418002884ec        lgrl    %r1,0000000001283ae0 
[    3.161518] Call Trace: 
[    3.161526]  [<0000000000d730fe>] lockdep_hardirqs_on+0x1ee/0x1f8  
[    3.161532] ([<0000000000d730fa>] lockdep_hardirqs_on+0x1ea/0x1f8) 
[    3.161538]  [<0000000000d81da6>] default_idle_call+0x96/0xd8  
[    3.161544]  [<0000000000199dba>] do_idle+0xf2/0x1b0  
[    3.161550]  [<000000000019a0ee>] cpu_startup_entry+0x36/0x40 
 
[    3.161604]  [<000000000141af2e>] arch_call_rest_init+0x76/0x80  
[    3.161645] INFO: lockdep is turned off. 
[    3.161649] Last Breaking-Event-Address: 
[    3.161661]  [<00000000001e436e>] vprintk_emit+0xde/0x1a8 
[    3.161700] irq event stamp: 10315 
[    3.161712] hardirqs last  enabled at (10313): [<00000000001aa3ec>] load_balance+0x2ac/0x9c0 
[    3.161751] hardirqs last disabled at (10314): [<0000000000d853c6>] __do_softirq+0x2de/0x3c8 
[    3.161756] softirqs last  enabled at (10315): [<0000000000d85384>] __do_softirq+0x29c/0x3c8 
[    3.161767] softirqs last disabled at (10308): [<000000000010dd70>] do_softirq_own_stack+0x70/0x80 
[    3.161802] ---[ end trace 4484b2a7468d1380 ]--- 
[    3.161887] Testing tracer function_graph: PASSED 
[    3.353303] prandom32: self test passed (less than 6 bits correlated) 
[    3.353319] prandom: seed boundary self test passed 

Haven't looked into it yet.

Regards
Sven

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-30  7:56     ` Peter Zijlstra
  2020-11-30  8:23       ` Sven Schnelle
@ 2020-11-30  8:36       ` Peter Zijlstra
  2020-11-30 17:55       ` Linus Torvalds
  2 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2020-11-30  8:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Paul E. McKenney, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Mon, Nov 30, 2020 at 08:56:51AM +0100, Peter Zijlstra wrote:
> The idea was to flip all of arch_cpu_idle() to not enable interrupts.
> 
> This is suboptimal for things like x86 where arch_cpu_idle() is
> basically STI;HLT, but x86 isn't likely to actually use this code path
> anyway, given all the various cpuidle drivers it has.
> 
> Many of the other archs are now doing things like arm's:
> wfi();raw_local_irq_enable().
> 
> Doing that tree-wide interrupt-state flip was something I didn't want to
> do at this late a stage, the chanse of messing that up is just too high.

It looks something like the below, but this needs to soak a bit because
who knows, I might've missed some obscure case ...

And then, after I've also flipped cpuidle, I need to go noinstr annotate
all the actual idle code :-/ (which is going to be a pain due to chasing
function pointers).

---
 arch/alpha/kernel/process.c      |  1 -
 arch/arc/kernel/process.c        |  3 +++
 arch/arm/kernel/process.c        |  1 -
 arch/arm/mach-gemini/board-dt.c  |  3 ++-
 arch/arm64/kernel/process.c      |  1 -
 arch/c6x/kernel/process.c        |  1 +
 arch/csky/kernel/process.c       |  1 -
 arch/h8300/kernel/process.c      |  1 +
 arch/hexagon/kernel/process.c    |  1 -
 arch/ia64/kernel/process.c       |  1 +
 arch/microblaze/kernel/process.c |  1 -
 arch/mips/kernel/idle.c          |  7 +------
 arch/nios2/kernel/process.c      |  1 -
 arch/openrisc/kernel/process.c   |  1 +
 arch/parisc/kernel/process.c     |  2 --
 arch/powerpc/kernel/idle.c       |  5 ++---
 arch/riscv/kernel/process.c      |  1 -
 arch/s390/kernel/idle.c          |  1 -
 arch/sh/kernel/idle.c            |  1 +
 arch/sparc/kernel/leon_pmc.c     |  4 ++++
 arch/sparc/kernel/process_32.c   |  1 -
 arch/sparc/kernel/process_64.c   |  3 ++-
 arch/um/kernel/process.c         |  1 -
 arch/x86/kernel/process.c        | 14 ++++----------
 arch/xtensa/kernel/process.c     |  1 +
 kernel/sched/idle.c              | 10 +++-------
 26 files changed, 27 insertions(+), 41 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 4c7b0414a3ff..d686e8faec6e 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -57,7 +57,6 @@ EXPORT_SYMBOL(pm_power_off);
 void arch_cpu_idle(void)
 {
 	wtint(0);
-	raw_local_irq_enable();
 }
 
 void arch_cpu_idle_dead(void)
diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index 37f724ad5e39..1d0832b28d09 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -114,6 +114,8 @@ void arch_cpu_idle(void)
 		"sleep %0	\n"
 		:
 		:"I"(arg)); /* can't be "r" has to be embedded const */
+
+	raw_local_irq_disable();
 }
 
 #else	/* ARC700 */
@@ -122,6 +124,7 @@ void arch_cpu_idle(void)
 {
 	/* sleep, but enable both set E1/E2 (levels of interrutps) before committing */
 	__asm__ __volatile__("sleep 0x3	\n");
+	raw_local_irq_disable();
 }
 
 #endif
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 9f199b1e8383..f9c97caa52d1 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -71,7 +71,6 @@ void arch_cpu_idle(void)
 		arm_pm_idle();
 	else
 		cpu_do_idle();
-	raw_local_irq_enable();
 }
 
 void arch_cpu_idle_prepare(void)
diff --git a/arch/arm/mach-gemini/board-dt.c b/arch/arm/mach-gemini/board-dt.c
index de0afcc8d94a..fbafe7475c02 100644
--- a/arch/arm/mach-gemini/board-dt.c
+++ b/arch/arm/mach-gemini/board-dt.c
@@ -42,8 +42,9 @@ static void gemini_idle(void)
 	 */
 
 	/* FIXME: Enabling interrupts here is racy! */
-	local_irq_enable();
+	raw_local_irq_enable();
 	cpu_do_idle();
+	raw_local_irq_disable();
 }
 
 static void __init gemini_init_machine(void)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 7697a4b48b7c..07f551879dc5 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -126,7 +126,6 @@ void arch_cpu_idle(void)
 	 * tricks
 	 */
 	cpu_do_idle();
-	raw_local_irq_enable();
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/arch/c6x/kernel/process.c b/arch/c6x/kernel/process.c
index 9f4fd6a40a10..9e638155ca76 100644
--- a/arch/c6x/kernel/process.c
+++ b/arch/c6x/kernel/process.c
@@ -45,6 +45,7 @@ void arch_cpu_idle(void)
 		      "   mvc .s2 %0,CSR\n"
 		      "|| idle\n"
 		      : "=b"(tmp));
+	raw_local_irq_disable();
 }
 
 static void halt_loop(void)
diff --git a/arch/csky/kernel/process.c b/arch/csky/kernel/process.c
index 69af6bc87e64..bae792dca7a3 100644
--- a/arch/csky/kernel/process.c
+++ b/arch/csky/kernel/process.c
@@ -102,6 +102,5 @@ void arch_cpu_idle(void)
 #ifdef CONFIG_CPU_PM_STOP
 	asm volatile("stop\n");
 #endif
-	raw_local_irq_enable();
 }
 #endif
diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c
index bc1364db58fe..59e301c3d805 100644
--- a/arch/h8300/kernel/process.c
+++ b/arch/h8300/kernel/process.c
@@ -59,6 +59,7 @@ void arch_cpu_idle(void)
 {
 	raw_local_irq_enable();
 	__asm__("sleep");
+	raw_local_irq_disable();
 }
 
 void machine_restart(char *__unused)
diff --git a/arch/hexagon/kernel/process.c b/arch/hexagon/kernel/process.c
index 67767c5ed98c..be1fa449b96f 100644
--- a/arch/hexagon/kernel/process.c
+++ b/arch/hexagon/kernel/process.c
@@ -44,7 +44,6 @@ void arch_cpu_idle(void)
 {
 	__vmwait();
 	/*  interrupts wake us up, but irqs are still disabled */
-	raw_local_irq_enable();
 }
 
 /*
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index c9ff8796b509..a2851080c491 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -240,6 +240,7 @@ void arch_cpu_idle(void)
 		(*mark_idle)(1);
 
 	raw_safe_halt();
+	raw_local_irq_disable();
 
 	if (mark_idle)
 		(*mark_idle)(0);
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index f99860771ff4..83a8110a758c 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -149,5 +149,4 @@ int dump_fpu(struct pt_regs *regs, elf_fpregset_t *fpregs)
 
 void arch_cpu_idle(void)
 {
-       raw_local_irq_enable();
 }
diff --git a/arch/mips/kernel/idle.c b/arch/mips/kernel/idle.c
index 18e69ebf5691..1b1e157bc27e 100644
--- a/arch/mips/kernel/idle.c
+++ b/arch/mips/kernel/idle.c
@@ -33,20 +33,19 @@ static void __cpuidle r3081_wait(void)
 {
 	unsigned long cfg = read_c0_conf();
 	write_c0_conf(cfg | R30XX_CONF_HALT);
-	raw_local_irq_enable();
 }
 
 static void __cpuidle r39xx_wait(void)
 {
 	if (!need_resched())
 		write_c0_conf(read_c0_conf() | TX39_CONF_HALT);
-	raw_local_irq_enable();
 }
 
 void __cpuidle r4k_wait(void)
 {
 	raw_local_irq_enable();
 	__r4k_wait();
+	raw_local_irq_disable();
 }
 
 /*
@@ -64,7 +63,6 @@ void __cpuidle r4k_wait_irqoff(void)
 		"	.set	arch=r4000	\n"
 		"	wait			\n"
 		"	.set	pop		\n");
-	raw_local_irq_enable();
 }
 
 /*
@@ -84,7 +82,6 @@ static void __cpuidle rm7k_wait_irqoff(void)
 		"	wait						\n"
 		"	mtc0	$1, $12		# stalls until W stage	\n"
 		"	.set	pop					\n");
-	raw_local_irq_enable();
 }
 
 /*
@@ -256,8 +253,6 @@ void arch_cpu_idle(void)
 {
 	if (cpu_wait)
 		cpu_wait();
-	else
-		raw_local_irq_enable();
 }
 
 #ifdef CONFIG_CPU_IDLE
diff --git a/arch/nios2/kernel/process.c b/arch/nios2/kernel/process.c
index 50b4eb19a6cc..40236d17b601 100644
--- a/arch/nios2/kernel/process.c
+++ b/arch/nios2/kernel/process.c
@@ -33,7 +33,6 @@ EXPORT_SYMBOL(pm_power_off);
 
 void arch_cpu_idle(void)
 {
-	raw_local_irq_enable();
 }
 
 /*
diff --git a/arch/openrisc/kernel/process.c b/arch/openrisc/kernel/process.c
index 3c98728cce24..eba87ed6603d 100644
--- a/arch/openrisc/kernel/process.c
+++ b/arch/openrisc/kernel/process.c
@@ -82,6 +82,7 @@ void arch_cpu_idle(void)
 	raw_local_irq_enable();
 	if (mfspr(SPR_UPR) & SPR_UPR_PMP)
 		mtspr(SPR_PMR, mfspr(SPR_PMR) | SPR_PMR_DME);
+	raw_local_irq_disable();
 }
 
 void (*pm_power_off) (void) = machine_power_off;
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index a92a23d6acd9..cb7858fcf81c 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -169,8 +169,6 @@ void __cpuidle arch_cpu_idle_dead(void)
 
 void __cpuidle arch_cpu_idle(void)
 {
-	raw_local_irq_enable();
-
 	/* nop on real hardware, qemu will idle sleep. */
 	asm volatile("or %%r10,%%r10,%%r10\n":::);
 }
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 1f835539fda4..4195311da67e 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -51,10 +51,9 @@ void arch_cpu_idle(void)
 		 * Some power_save functions return with
 		 * interrupts enabled, some don't.
 		 */
-		if (irqs_disabled())
-			raw_local_irq_enable();
+		if (!irqs_disabled())
+			raw_local_irq_disable();
 	} else {
-		raw_local_irq_enable();
 		/*
 		 * Go into low thread priority and possibly
 		 * low power mode.
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index dd5f985b1f40..886eef55645e 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -36,7 +36,6 @@ extern asmlinkage void ret_from_kernel_thread(void);
 void arch_cpu_idle(void)
 {
 	wait_for_interrupt();
-	raw_local_irq_enable();
 }
 
 void show_regs(struct pt_regs *regs)
diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c
index 2b85096964f8..5bd8c1044d09 100644
--- a/arch/s390/kernel/idle.c
+++ b/arch/s390/kernel/idle.c
@@ -123,7 +123,6 @@ void arch_cpu_idle_enter(void)
 void arch_cpu_idle(void)
 {
 	enabled_wait();
-	raw_local_irq_enable();
 }
 
 void arch_cpu_idle_exit(void)
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index f59814983bd5..3418c40f0099 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -25,6 +25,7 @@ void default_idle(void)
 	raw_local_irq_enable();
 	/* Isn't this racy ? */
 	cpu_sleep();
+	raw_local_irq_disable();
 	clear_bl_bit();
 }
 
diff --git a/arch/sparc/kernel/leon_pmc.c b/arch/sparc/kernel/leon_pmc.c
index 396f46bca52e..6c00cbad7fb5 100644
--- a/arch/sparc/kernel/leon_pmc.c
+++ b/arch/sparc/kernel/leon_pmc.c
@@ -57,6 +57,8 @@ static void pmc_leon_idle_fixup(void)
 		"lda	[%0] %1, %%g0\n"
 		:
 		: "r"(address), "i"(ASI_LEON_BYPASS));
+
+	raw_local_irq_disable();
 }
 
 /*
@@ -70,6 +72,8 @@ static void pmc_leon_idle(void)
 
 	/* For systems without power-down, this will be no-op */
 	__asm__ __volatile__ ("wr	%g0, %asr19\n\t");
+
+	raw_local_irq_disable();
 }
 
 /* Install LEON Power Down function */
diff --git a/arch/sparc/kernel/process_32.c b/arch/sparc/kernel/process_32.c
index a02363735915..53c02385e352 100644
--- a/arch/sparc/kernel/process_32.c
+++ b/arch/sparc/kernel/process_32.c
@@ -74,7 +74,6 @@ void arch_cpu_idle(void)
 {
 	if (sparc_idle)
 		(*sparc_idle)();
-	raw_local_irq_enable();
 }
 
 /* XXX cli/sti -> local_irq_xxx here, check this works once SMP is fixed. */
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index 6f8c7822fc06..e5137da95268 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -62,7 +62,6 @@ void arch_cpu_idle(void)
 {
 	if (tlb_type != hypervisor) {
 		touch_nmi_watchdog();
-		raw_local_irq_enable();
 	} else {
 		unsigned long pstate;
 
@@ -93,6 +92,8 @@ void arch_cpu_idle(void)
 			"wrpr %0, %%g0, %%pstate"
 			: "=&r" (pstate)
 			: "i" (PSTATE_IE));
+
+		raw_local_irq_disable();
 	}
 }
 
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 9505a7e87396..c3674ac848f0 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -217,7 +217,6 @@ void arch_cpu_idle(void)
 {
 	cpu_tasks[current_thread_info()->cpu].pid = os_getpid();
 	um_idle_sleep();
-	raw_local_irq_enable();
 }
 
 int __cant_sleep(void) {
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 145a7ac0c19a..f0973aa101ea 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -686,6 +686,7 @@ void arch_cpu_idle(void)
 void __cpuidle default_idle(void)
 {
 	raw_safe_halt();
+	raw_local_irq_disable();
 }
 #if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
 EXPORT_SYMBOL(default_idle);
@@ -755,13 +756,7 @@ static void amd_e400_idle(void)
 
 	default_idle();
 
-	/*
-	 * The switch back from broadcast mode needs to be called with
-	 * interrupts disabled.
-	 */
-	raw_local_irq_disable();
 	tick_broadcast_exit();
-	raw_local_irq_enable();
 }
 
 /*
@@ -800,12 +795,11 @@ static __cpuidle void mwait_idle(void)
 		}
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
-		if (!need_resched())
+		if (!need_resched()) {
 			__sti_mwait(0, 0);
-		else
-			raw_local_irq_enable();
+			raw_local_irq_disable();
+		}
 	} else {
-		raw_local_irq_enable();
 	}
 	__current_clr_polling();
 }
diff --git a/arch/xtensa/kernel/process.c b/arch/xtensa/kernel/process.c
index 397a7de56377..1cea1d92ff12 100644
--- a/arch/xtensa/kernel/process.c
+++ b/arch/xtensa/kernel/process.c
@@ -120,6 +120,7 @@ void coprocessor_flush_all(struct thread_info *ti)
 void arch_cpu_idle(void)
 {
 	platform_idle();
+	raw_local_irq_disable();
 }
 
 /*
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index c6932b8f4467..71a30820bfd2 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -96,8 +96,8 @@ void __cpuidle default_idle_call(void)
 		stop_critical_timings();
 
 		/*
-		 * arch_cpu_idle() is supposed to enable IRQs, however
-		 * we can't do that because of RCU and tracing.
+		 * arch_cpu_idle() is allowed to (temporary) enable IRQs. It
+		 * will return with IRQs disabled.
 		 *
 		 * Trace IRQs enable here, then switch off RCU, and have
 		 * arch_cpu_idle() use raw_local_irq_enable(). Note that
@@ -112,12 +112,8 @@ void __cpuidle default_idle_call(void)
 		arch_cpu_idle();
 
 		/*
-		 * OK, so IRQs are enabled here, but RCU needs them disabled to
-		 * turn itself back on.. funny thing is that disabling IRQs
-		 * will cause tracing, which needs RCU. Jump through hoops to
-		 * make it 'work'.
+		 * Carefully unto the above.
 		 */
-		raw_local_irq_disable();
 		lockdep_hardirqs_off(_THIS_IP_);
 		rcu_idle_exit();
 		lockdep_hardirqs_on(_THIS_IP_);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-30  8:23       ` Sven Schnelle
@ 2020-11-30 12:31         ` Sven Schnelle
  2020-11-30 12:52           ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Sven Schnelle @ 2020-11-30 12:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Thomas Gleixner, Paul E. McKenney,
	Linux Kernel Mailing List, the arch/x86 maintainers

Hi,

Sven Schnelle <svens@linux.ibm.com> writes:

> Hi Peter,
>
> Peter Zijlstra <peterz@infradead.org> writes:
>
>> On Sun, Nov 29, 2020 at 11:31:41AM -0800, Linus Torvalds wrote:
>>> On Sun, Nov 29, 2020 at 5:38 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>> >
>>> > Yet two more places which invoke tracing from RCU disabled regions in the
>>> > idle path. Similar to the entry path the low level idle functions have to
>>> > be non-instrumentable.
>>> 
>>> This really seems less than optimal.
>>> 
>>> In particular, lookie here:
>>> 
>>> > @@ -94,9 +94,35 @@ void __cpuidle default_idle_call(void)
>>> >
>>> >                 trace_cpu_idle(1, smp_processor_id());
>>> >                 stop_critical_timings();
>>> > +
>>> > +               /*
>>> > +                * arch_cpu_idle() is supposed to enable IRQs, however
>>> > +                * we can't do that because of RCU and tracing.
>>> > +                *
>>> > +                * Trace IRQs enable here, then switch off RCU, and have
>>> > +                * arch_cpu_idle() use raw_local_irq_enable(). Note that
>>> > +                * rcu_idle_enter() relies on lockdep IRQ state, so switch that
>>> > +                * last -- this is very similar to the entry code.
>>> > +                */
>>> > +               trace_hardirqs_on_prepare();
>>> > +               lockdep_hardirqs_on_prepare(_THIS_IP_);
>>> >                 rcu_idle_enter();
>>> > +               lockdep_hardirqs_on(_THIS_IP_);
>>> > +
>>> >                 arch_cpu_idle();
>>> > +
>>> > +               /*
>>> > +                * OK, so IRQs are enabled here, but RCU needs them disabled to
>>> > +                * turn itself back on.. funny thing is that disabling IRQs
>>> > +                * will cause tracing, which needs RCU. Jump through hoops to
>>> > +                * make it 'work'.
>>> > +                */
>>> > +               raw_local_irq_disable();
>>> > +               lockdep_hardirqs_off(_THIS_IP_);
>>> >                 rcu_idle_exit();
>>> > +               lockdep_hardirqs_on(_THIS_IP_);
>>> > +               raw_local_irq_enable();
>>> > +
>>> >                 start_critical_timings();
>>> >                 trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
>>> >         }
>>> 
>>> And look at what the code generation for the idle exit path is when
>>> lockdep isn't even on.
>>
>> Agreed.
>>
>> The idea was to flip all of arch_cpu_idle() to not enable interrupts.
>>
>> This is suboptimal for things like x86 where arch_cpu_idle() is
>> basically STI;HLT, but x86 isn't likely to actually use this code path
>> anyway, given all the various cpuidle drivers it has.
>>
>> Many of the other archs are now doing things like arm's:
>> wfi();raw_local_irq_enable().
>>
>> Doing that tree-wide interrupt-state flip was something I didn't want to
>> do at this late a stage, the chanse of messing that up is just too high.
>>
>> After that I need to go look at flipping cpuidle, which is even more
>> 'interesting'. cpuidle_enter() has the exact same semantics, and this is
>> the code path that x86 actually uses, and here it's inconsitent at best.
>
> On s390 this introduces the following splat:
> [..]

I sent you the wrong backtrace. This is the correct one:

[    0.667491] smp: Bringing up secondary CPUs ... 
[    0.670262] random: get_random_bytes called from __warn+0x12a/0x160 with crng_init=1 
[    0.670280] ------------[ cut here ]------------ 
[    0.670288] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:1054 rcu_irq_enter+0x7e/0xa8 
[    0.670293] Modules linked in: 
[    0.670299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W         5.10.0-rc6 #2263 
[    0.670304] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) 
[    0.670309] Krnl PSW : 0404d00180000000 0000000000d8a8da (rcu_irq_enter+0x82/0xa8) 
[    0.670318]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 
[    0.670325] Krnl GPRS: 0000000000000000 0000000080000002 0000000000000001 000000000101fcee 
[    0.670331]            0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[    0.670337]            000003e00029ff48 0000000000000000 00000000017212d8 0000000000000001 
[    0.670343]            0000000005ba0100 00000000000324bb 000003e00029fe40 000003e00029fe10
 
[    0.670358] Krnl Code: 0000000000d8a8ca: ec180013017e        cij     %r1,1,8,0000000000d8a8f0 
[    0.670358]            0000000000d8a8d0: ecb80005007e        cij     %r11,0,8,0000000000d8a8da 
[    0.670358]           #0000000000d8a8d6: af000000            mc      0,0 
[    0.670358]           >0000000000d8a8da: ebbff0a00004        lmg     %r11,%r15,160(%r15) 
[    0.670358]            0000000000d8a8e0: c0f4ffffff68        brcl    15,0000000000d8a7b0 
[    0.670358]            0000000000d8a8e6: c0e5000038c1        brasl   %r14,0000000000d91a68 
[    0.670358]            0000000000d8a8ec: a7f4ffdc            brc     15,0000000000d8a8a4 
[    0.670358]            0000000000d8a8f0: c0e5000038bc        brasl   %r14,0000000000d91a68 
[    0.670392] Call Trace: 
[    0.670396]  [<0000000000d8a8da>] rcu_irq_enter+0x82/0xa8  
[    0.670401]  [<0000000000157f9a>] irq_enter+0x22/0x30  
[    0.670404]  [<000000000010e51c>] do_IRQ+0x64/0xd0  
[    0.670408]  [<0000000000d9a65a>] ext_int_handler+0x18e/0x194  
[    0.670412]  [<0000000000d9a6a0>] psw_idle+0x40/0x48  
[    0.670416] ([<0000000000104202>] enabled_wait+0x22/0xf0) 
[    0.670419]  [<00000000001046e2>] arch_cpu_idle+0x22/0x38  
[    0.670423]  [<0000000000d986cc>] default_idle_call+0x74/0xd8  
[    0.670427]  [<000000000019a94a>] do_idle+0xf2/0x1b0  
[    0.670431]  [<000000000019ac7e>] cpu_startup_entry+0x36/0x40  
[    0.670435]  [<0000000000118b9a>] smp_start_secondary+0x82/0x88  
[    0.670438] INFO: lockdep is turned off. 
[    0.670441] Last Breaking-Event-Address: 
[    0.670444]  [<0000000000157f94>] irq_enter+0x1c/0x30 
[    0.670447] irq event stamp: 19 
[    0.670451] hardirqs last  enabled at (19): [<0000000000d98688>] default_idle_call+0x30/0xd8 
[    0.670455] hardirqs last disabled at (18): [<000000000019a9c2>] do_idle+0x16a/0x1b0 
[    0.670459] softirqs last  enabled at (0): [<000000000014c3be>] copy_process+0x48e/0x14c0 
[    0.670463] softirqs last disabled at (0): [<0000000000000000>] 0x0 
[    0.670466] ---[ end trace cba3783aedff6f79 ]--- 
[    0.670685] smp: Brought up 1 node, 2 CPUs 
[    0.676300]  
[    0.676301] ============================= 
[    0.676302] WARNING: suspicious RCU usage 
[    0.676303] 5.10.0-rc6 #2263 Not tainted 
[    0.676303] ----------------------------- 
[    0.676304] include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage! 
[    0.676305]  
[    0.676306] other info that might help us debug this: 
[    0.676306]  
[    0.676307]  
[    0.676307] rcu_scheduler_active = 1, debug_locks = 1 
[    0.676308] RCU used illegally from extended quiescent state! 
[    0.676309] no locks held by swapper/1/0. 
[    0.676309]  
[    0.676310] stack backtrace: 
[    0.676310] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.10.0-rc6 #2263 
[    0.676311] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) 
[    0.676312] Call Trace: 
[    0.676312]  [<0000000000d726f8>] show_stack+0x90/0xf8  
[    0.676313]  [<0000000000d855f2>] dump_stack+0xa2/0xd8  
[    0.676314]  [<00000000001d216e>] trace_lock_acquire+0x1ce/0x1d8  
[    0.676314]  [<00000000001d7c44>] lock_acquire+0x44/0x88  
[    0.676315]  [<0000000000d98788>] _raw_spin_lock+0x58/0xa8 
 
[    0.676316]  [<00000000001e4e94>] vprintk_emit+0x74/0x1a8  
[    0.676316]  [<00000000001e4ffe>] vprintk_default+0x36/0x48  
[    0.676317]  [<0000000000d76526>] printk+0x46/0x58  
[    0.676318]  [<00000000009ae188>] report_bug+0x110/0x130  
[    0.676318]  [<000000000010215c>] monitor_event_exception+0x44/0xc0  
[    0.676319]  [<0000000000d9a10e>] pgm_check_handler+0x1da/0x238  
[    0.676320]  [<0000000000d8a8da>] rcu_irq_enter+0x82/0xa8  
[    0.676321] ------------[ cut here ]------------ 
[    0.676321] DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled()) 
[    0.676322] WARNING: CPU: 1 PID: 0 at kernel/locking/lockdep.c:5281 check_flags.part.0+0x1cc/0x208 
[    0.676323] Modules linked in: 
[    0.676324] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.10.0-rc6 #2263 
[    0.676325] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) 
[    0.676325] Krnl PSW : 0404c00180000000 00000000001d79d0 (check_flags.part.0+0x1d0/0x208) 
[    0.676327]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 
[    0.676329] Krnl GPRS: 00000000ffffffe5 000000000116dbd4 000000000000002f 0000000000fe8c00 
[    0.676330]            0000000000000000 00000000009ad0e8 0000000000000000 0000000000000000 
[    0.676330]            00000000012d7f20 0000000000000000 040003e000000001 000000000257f310 
[    0.676331]            0000000005ba0100 00000000011725b0 00000000001d79cc 000003e00029f638 
[    0.676332] Krnl Code: 00000000001d79c0: c02000708989        larl    %r2,0000000000fe8cd2 
[    0.676333]            00000000001d79c6: c0e5005cde09        brasl   %r14,0000000000d735d8 
[    0.676334]           #00000000001d79cc: af000000            mc      0,0 
[    0.676335]           >00000000001d79d0: c0200070d844        larl    %r2,0000000000ff2a58 
[    0.676337]            00000000001d79d6: c0e5005cf585        brasl   %r14,0000000000d764e0 
[    0.676338]            00000000001d79dc: a7f4ff3f            brc     15,00000000001d785a 
[    0.676339]            00000000001d79e0: c0e5005dd044        brasl   %r14,0000000000d91a68 
[    0.676340]            00000000001d79e6: a7f4ff6f            brc     15,00000000001d78c4 
[    0.676341] Call Trace: 
[    0.676342]  [<00000000001d79d0>] check_flags.part.0+0x1d0/0x208  
[    0.676343] ([<00000000001d79cc>] check_flags.part.0+0x1cc/0x208) 
[    0.676344]  [<00000000001d7a9e>] lock_acquire.part.0+0x96/0x1f8  
[    0.676344]  [<00000000001d7c7c>] lock_acquire+0x7c/0x88  
[    0.676345]  [<000000000027bce4>] __is_insn_slot_addr+0x5c/0x150  
[    0.676346]  [<000000000017dbf2>] kernel_text_address+0x11a/0x130  
[    0.676346]  [<000000000017dc2e>] __kernel_text_address+0x26/0x70  
[    0.676347]  [<000000000011843c>] unwind_next_frame+0x104/0x1c0  
[    0.676348]  [<0000000000d72738>] show_stack+0xd0/0xf8  
[    0.676348]  [<0000000000d855f2>] dump_stack+0xa2/0xd8  
[    0.676349]  [<00000000001d216e>] trace_lock_acquire+0x1ce/0x1d8  
[    0.676350]  [<00000000001d7c44>] lock_acquire+0x44/0x88  
[    0.676350]  [<0000000000d98788>] _raw_spin_lock+0x58/0xa8  
[    0.676351]  [<00000000001e4e94>] vprintk_emit+0x74/0x1a8  
[    0.676352]  [<00000000001e4ffe>] vprintk_default+0x36/0x48  
[    0.676352]  [<0000000000d76526>] printk+0x46/0x58  
[    0.676353]  [<00000000009ae188>] report_bug+0x110/0x130  
[    0.676354]  [<000000000010215c>] monitor_event_exception+0x44/0xc0  
[    0.676354]  [<0000000000d9a10e>] pgm_check_handler+0x1da/0x238  
[    0.676355]  [<0000000000d8a8da>] rcu_irq_enter+0x82/0xa8  
[    0.676356]  [<0000000000157f9a>] irq_enter+0x22/0x30  
[    0.676356]  [<000000000010e51c>] do_IRQ+0x64/0xd0  
[    0.676357]  [<0000000000d9a65a>] ext_int_handler+0x18e/0x194  
[    0.676358]  [<0000000000d9a6a0>] psw_idle+0x40/0x48  
[    0.676358] ([<0000000000104202>] enabled_wait+0x22/0xf0) 
[    0.676359]  [<00000000001046e2>] arch_cpu_idle+0x22/0x38  
[    0.676360]  [<0000000000d986cc>] default_idle_call+0x74/0xd8  
[    0.676360]  [<000000000019a94a>] do_idle+0xf2/0x1b0  
[    0.676361]  [<000000000019ac7e>] cpu_startup_entry+0x36/0x40  
[    0.676362]  [<0000000000118b9a>] smp_start_secondary+0x82/0x88  
[    0.676362] INFO: lockdep is turned off. 
[    0.676363] Last Breaking-Event-Address: 
[    0.676364]  [<0000000000d9c4d0>] __s390_indirect_jump_r14+0x0/0xc 
[    0.676364] irq event stamp: 19 
[    0.676365] hardirqs last  enabled at (19): [<0000000000d98688>] default_idle_call+0x30/0xd8 
[    0.676366] hardirqs last disabled at (18): [<000000000019a9c2>] do_idle+0x16a/0x1b0 
[    0.676367] softirqs last  enabled at (0): [<000000000014c3be>] copy_process+0x48e/0x14c0 
[    0.676367] softirqs last disabled at (0): [<0000000000000000>] 0x0 
[    0.676368] ---[ end trace cba3783aedff6f78 ]--- 
[    0.676369] possible reason: unannotated irqs-off. 
[    0.676369] irq event stamp: 19 
[    0.676370] hardirqs last  enabled at (19): [<0000000000d98688>] default_idle_call+0x30/0xd8 
[    0.676371] hardirqs last disabled at (18): [<000000000019a9c2>] do_idle+0x16a/0x1b0 
[    0.676372] softirqs last  enabled at (0): [<000000000014c3be>] copy_process+0x48e/0x14c0 
[    0.676372] softirqs last disabled at (0): [<0000000000000000>] 0x0 
[    0.676373]  [<0000000000157f9a>] irq_enter+0x22/0x30  
[    0.676374]  [<000000000010e51c>] do_IRQ+0x64/0xd0  
[    0.676374]  [<0000000000d9a65a>] ext_int_handler+0x18e/0x194  
[    0.676375]  [<0000000000d9a6a0>] psw_idle+0x40/0x48  
[    0.676376] ([<0000000000104202>] enabled_wait+0x22/0xf0) 
[    0.676376]  [<00000000001046e2>] arch_cpu_idle+0x22/0x38  
[    0.676377]  [<0000000000d986cc>] default_idle_call+0x74/0xd8  
[    0.676378]  [<000000000019a94a>] do_idle+0xf2/0x1b0  
[    0.676378]  [<000000000019ac7e>] cpu_startup_entry+0x36/0x40  
[    0.676379]  [<0000000000118b9a>] smp_start_secondary+0x82/0x88  
[    0.676379] INFO: lockdep is turned off. 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-30 12:31         ` Sven Schnelle
@ 2020-11-30 12:52           ` Peter Zijlstra
  2020-11-30 13:03             ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2020-11-30 12:52 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Linus Torvalds, Thomas Gleixner, Paul E. McKenney,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Mon, Nov 30, 2020 at 01:31:33PM +0100, Sven Schnelle wrote:
> [    0.670280] ------------[ cut here ]------------ 
> [    0.670288] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:1054 rcu_irq_enter+0x7e/0xa8 
> [    0.670293] Modules linked in: 
> [    0.670299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W         5.10.0-rc6 #2263 
> [    0.670304] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) 
> [    0.670309] Krnl PSW : 0404d00180000000 0000000000d8a8da (rcu_irq_enter+0x82/0xa8) 
> [    0.670318]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 
> [    0.670325] Krnl GPRS: 0000000000000000 0000000080000002 0000000000000001 000000000101fcee 
> [    0.670331]            0000000000000000 0000000000000000 0000000000000000 0000000000000000 
> [    0.670337]            000003e00029ff48 0000000000000000 00000000017212d8 0000000000000001 
> [    0.670343]            0000000005ba0100 00000000000324bb 000003e00029fe40 000003e00029fe10
>  
> [    0.670358] Krnl Code: 0000000000d8a8ca: ec180013017e        cij     %r1,1,8,0000000000d8a8f0 
> [    0.670358]            0000000000d8a8d0: ecb80005007e        cij     %r11,0,8,0000000000d8a8da 
> [    0.670358]           #0000000000d8a8d6: af000000            mc      0,0 
> [    0.670358]           >0000000000d8a8da: ebbff0a00004        lmg     %r11,%r15,160(%r15) 
> [    0.670358]            0000000000d8a8e0: c0f4ffffff68        brcl    15,0000000000d8a7b0 
> [    0.670358]            0000000000d8a8e6: c0e5000038c1        brasl   %r14,0000000000d91a68 
> [    0.670358]            0000000000d8a8ec: a7f4ffdc            brc     15,0000000000d8a8a4 
> [    0.670358]            0000000000d8a8f0: c0e5000038bc        brasl   %r14,0000000000d91a68 
> [    0.670392] Call Trace: 
> [    0.670396]  [<0000000000d8a8da>] rcu_irq_enter+0x82/0xa8  
> [    0.670401]  [<0000000000157f9a>] irq_enter+0x22/0x30  
> [    0.670404]  [<000000000010e51c>] do_IRQ+0x64/0xd0  
> [    0.670408]  [<0000000000d9a65a>] ext_int_handler+0x18e/0x194  
> [    0.670412]  [<0000000000d9a6a0>] psw_idle+0x40/0x48  
> [    0.670416] ([<0000000000104202>] enabled_wait+0x22/0xf0) 
> [    0.670419]  [<00000000001046e2>] arch_cpu_idle+0x22/0x38  
> [    0.670423]  [<0000000000d986cc>] default_idle_call+0x74/0xd8  
> [    0.670427]  [<000000000019a94a>] do_idle+0xf2/0x1b0  
> [    0.670431]  [<000000000019ac7e>] cpu_startup_entry+0x36/0x40  
> [    0.670435]  [<0000000000118b9a>] smp_start_secondary+0x82/0x88  

But but but...

  do_idle()			# IRQs on
    local_irq_disable();	# IRQs off
    defaul_idle_call()		# IRQs off
      arch_cpu_idle()		# IRQs off
        enabled_wait()		# IRQs off
	  raw_local_save()	# still off
	  psw_idle()		# very much off
	    ext_int_handler	# get an interrupt ?!?!

Help?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-30 12:52           ` Peter Zijlstra
@ 2020-11-30 13:03             ` Peter Zijlstra
  2020-11-30 18:04               ` Linus Torvalds
                                 ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Peter Zijlstra @ 2020-11-30 13:03 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Linus Torvalds, Thomas Gleixner, Paul E. McKenney,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Mon, Nov 30, 2020 at 01:52:11PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 30, 2020 at 01:31:33PM +0100, Sven Schnelle wrote:
> > [    0.670280] ------------[ cut here ]------------ 
> > [    0.670288] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:1054 rcu_irq_enter+0x7e/0xa8 
> > [    0.670293] Modules linked in: 
> > [    0.670299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W         5.10.0-rc6 #2263 
> > [    0.670304] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) 
> > [    0.670309] Krnl PSW : 0404d00180000000 0000000000d8a8da (rcu_irq_enter+0x82/0xa8) 
> > [    0.670318]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 
> > [    0.670325] Krnl GPRS: 0000000000000000 0000000080000002 0000000000000001 000000000101fcee 
> > [    0.670331]            0000000000000000 0000000000000000 0000000000000000 0000000000000000 
> > [    0.670337]            000003e00029ff48 0000000000000000 00000000017212d8 0000000000000001 
> > [    0.670343]            0000000005ba0100 00000000000324bb 000003e00029fe40 000003e00029fe10
> >  
> > [    0.670358] Krnl Code: 0000000000d8a8ca: ec180013017e        cij     %r1,1,8,0000000000d8a8f0 
> > [    0.670358]            0000000000d8a8d0: ecb80005007e        cij     %r11,0,8,0000000000d8a8da 
> > [    0.670358]           #0000000000d8a8d6: af000000            mc      0,0 
> > [    0.670358]           >0000000000d8a8da: ebbff0a00004        lmg     %r11,%r15,160(%r15) 
> > [    0.670358]            0000000000d8a8e0: c0f4ffffff68        brcl    15,0000000000d8a7b0 
> > [    0.670358]            0000000000d8a8e6: c0e5000038c1        brasl   %r14,0000000000d91a68 
> > [    0.670358]            0000000000d8a8ec: a7f4ffdc            brc     15,0000000000d8a8a4 
> > [    0.670358]            0000000000d8a8f0: c0e5000038bc        brasl   %r14,0000000000d91a68 
> > [    0.670392] Call Trace: 
> > [    0.670396]  [<0000000000d8a8da>] rcu_irq_enter+0x82/0xa8  
> > [    0.670401]  [<0000000000157f9a>] irq_enter+0x22/0x30  
> > [    0.670404]  [<000000000010e51c>] do_IRQ+0x64/0xd0  
> > [    0.670408]  [<0000000000d9a65a>] ext_int_handler+0x18e/0x194  
> > [    0.670412]  [<0000000000d9a6a0>] psw_idle+0x40/0x48  
> > [    0.670416] ([<0000000000104202>] enabled_wait+0x22/0xf0) 
> > [    0.670419]  [<00000000001046e2>] arch_cpu_idle+0x22/0x38  
> > [    0.670423]  [<0000000000d986cc>] default_idle_call+0x74/0xd8  
> > [    0.670427]  [<000000000019a94a>] do_idle+0xf2/0x1b0  
> > [    0.670431]  [<000000000019ac7e>] cpu_startup_entry+0x36/0x40  
> > [    0.670435]  [<0000000000118b9a>] smp_start_secondary+0x82/0x88  
> 
> But but but...
> 
>   do_idle()			# IRQs on
>     local_irq_disable();	# IRQs off
>     defaul_idle_call()	# IRQs off
	lockdep_hardirqs_on();	# IRQs off, but lockdep things they're on
>       arch_cpu_idle()		# IRQs off
>         enabled_wait()	# IRQs off
> 	  raw_local_save()	# still off
> 	  psw_idle()		# very much off
> 	    ext_int_handler	# get an interrupt ?!?!
	      rcu_irq_enter()	# lockdep thinks IRQs are on <- FAIL


I can't much read s390 assembler, but ext_int_handler() has a
TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
with the actual state, but there's some condition before it, what's that
test and is that right?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-30  7:56     ` Peter Zijlstra
  2020-11-30  8:23       ` Sven Schnelle
  2020-11-30  8:36       ` Peter Zijlstra
@ 2020-11-30 17:55       ` Linus Torvalds
  2020-12-01  7:39         ` Peter Zijlstra
  2 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2020-11-30 17:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Paul E. McKenney, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Sun, Nov 29, 2020 at 11:57 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> The idea was to flip all of arch_cpu_idle() to not enable interrupts.

I don't think that's realistic.

> This is suboptimal for things like x86 where arch_cpu_idle() is
> basically STI;HLT, but x86 isn't likely to actually use this code path
> anyway, given all the various cpuidle drivers it has.

Well, but the thing is, "enable interrupts" is pretty much fundamental
to any idle routine.

Idling with interrupts disabled is not a sensible operation. The fact
that on x86 the sequence is "sti;hlt" is not just some x86 oddity,
it's basically fundamental to the whole notion of idle.

Yes, yes, I can very well imagine some hardware doing a "idle until
you sense an interrupt, but don't actually take it". It's not
_impossible_. But it's certainly not normal.

So I think it's completely misguided to think that the default idle
routine should assume that arch_cpu_idle() wouldn't enable interrupts.

             Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-30 13:03             ` Peter Zijlstra
@ 2020-11-30 18:04               ` Linus Torvalds
  2020-11-30 19:31                 ` Christian Borntraeger
  2020-12-01  7:48               ` Sven Schnelle
  2020-12-02  7:54               ` Heiko Carstens
  2 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2020-11-30 18:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sven Schnelle, Thomas Gleixner, Paul E. McKenney,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Mon, Nov 30, 2020 at 5:03 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> > But but but...
> >
> >   do_idle()                   # IRQs on
> >     local_irq_disable();      # IRQs off
> >     defaul_idle_call()        # IRQs off
>         lockdep_hardirqs_on();  # IRQs off, but lockdep things they're on
> >       arch_cpu_idle()         # IRQs off
> >         enabled_wait()        # IRQs off
> >         raw_local_save()      # still off
> >         psw_idle()            # very much off
> >           ext_int_handler     # get an interrupt ?!?!
>               rcu_irq_enter()   # lockdep thinks IRQs are on <- FAIL
>
> I can't much read s390 assembler, but ext_int_handler() has a
> TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> with the actual state, but there's some condition before it, what's that
> test and is that right?

I think that "psw_idle()" enables interrupts, exactly like x86 does.
See my previous email.

But no, I can't read s390 asm either. IBM is notorious for making up
odd IBM-only incomprehensible names. When "oi" means "or immediate", I
personally start suspecting that there were some "happy drugs"
involved.

To make matters worse, some of the assembly code in psw_idle isn't
even assembly code, it's machine code, with "BPON" being an
alternative instruction definition with just the hex encoding for the
machine code instruction rather than any actual human-legible
instruction encoding.

Of course, when the "human-legible" instructions are "oi", I guess hex
codes aren't all that much less legible..

s390 programmers must be some super-human breed. Or, alternatively,
they are munching happy pills by the truck-load to get over the pain
;)

           Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-30 18:04               ` Linus Torvalds
@ 2020-11-30 19:31                 ` Christian Borntraeger
  2020-12-01  8:07                   ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Christian Borntraeger @ 2020-11-30 19:31 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra
  Cc: Sven Schnelle, Thomas Gleixner, Paul E. McKenney,
	Linux Kernel Mailing List, the arch/x86 maintainers



On 30.11.20 19:04, Linus Torvalds wrote:
> On Mon, Nov 30, 2020 at 5:03 AM Peter Zijlstra <peterz@infradead.org> wrote:
>>
>>> But but but...
>>>
>>>   do_idle()                   # IRQs on
>>>     local_irq_disable();      # IRQs off
>>>     defaul_idle_call()        # IRQs off
>>         lockdep_hardirqs_on();  # IRQs off, but lockdep things they're on
>>>       arch_cpu_idle()         # IRQs off
>>>         enabled_wait()        # IRQs off
>>>         raw_local_save()      # still off
>>>         psw_idle()            # very much off
>>>           ext_int_handler     # get an interrupt ?!?!
>>               rcu_irq_enter()   # lockdep thinks IRQs are on <- FAIL
>>
>> I can't much read s390 assembler, but ext_int_handler() has a
>> TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
>> with the actual state, but there's some condition before it, what's that
>> test and is that right?
> 
> I think that "psw_idle()" enables interrupts, exactly like x86 does.

Yes, by definition.  Otherwise it would be an software error state.
The interesting part is the lpswe instruction at the end (load PSW) 
which loads the full PSW, which contains interrupt enablement, wait bit,
condition code, paging enablement, machine check enablement the address
and others. The idle psw is enabled for interrupts and has the wait bit
set. If the wait bit is set and interrupts are off this is called "disabled
wait" and is used for panic, shutdown etc. 

> See my previous email.
> 
> But no, I can't read s390 asm either. IBM is notorious for making up
> odd IBM-only incomprehensible names. When "oi" means "or immediate", I
> personally start suspecting that there were some "happy drugs"
> involved
> 
> To make matters worse, some of the assembly code in psw_idle isn't
> even assembly code, it's machine code, with "BPON" being an
> alternative instruction definition with just the hex encoding for the
> machine code instruction rather than any actual human-legible
> instruction encoding.
> 
> Of course, when the "human-legible" instructions are "oi", I guess hex
> codes aren't all that much less legible..
> 
> s390 programmers must be some super-human breed. Or, alternatively,
> they are munching happy pills by the truck-load to get over the pain
> ;)


For something that was defined in 1964 it is certainly not too bad.
And for the oddities, you simply get used to it.

In the end the scheme is actually relatively straightforward. 
The first character defines what it is:
o: or
a: add
n: and
l: load	
i: insert (kind of a load to parts, e.g. only one byte)
st: store
b: branch
j: jump
and some more but those should cover 80% of what you need.
and the following characters then tell if 32 or 64 bit (g), 
immediate (i) value or memory, type of register (float, general purpose).
so lg for a 64 bit load, l for a 32bit load. 
ld for a load into a floating point register (double)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-30 17:55       ` Linus Torvalds
@ 2020-12-01  7:39         ` Peter Zijlstra
  2020-12-01  7:56           ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2020-12-01  7:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Paul E. McKenney, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Mon, Nov 30, 2020 at 09:55:44AM -0800, Linus Torvalds wrote:

> Yes, yes, I can very well imagine some hardware doing a "idle until
> you sense an interrupt, but don't actually take it". It's not
> _impossible_. But it's certainly not normal.

A lot of hardware actually does that, including modern x86. But yes, not
nearly everything.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-30 13:03             ` Peter Zijlstra
  2020-11-30 18:04               ` Linus Torvalds
@ 2020-12-01  7:48               ` Sven Schnelle
  2020-12-02  7:54               ` Heiko Carstens
  2 siblings, 0 replies; 38+ messages in thread
From: Sven Schnelle @ 2020-12-01  7:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Thomas Gleixner, Paul E. McKenney,
	Linux Kernel Mailing List, the arch/x86 maintainers

Hi Peter,

Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, Nov 30, 2020 at 01:52:11PM +0100, Peter Zijlstra wrote:
>> On Mon, Nov 30, 2020 at 01:31:33PM +0100, Sven Schnelle wrote:
>> > [    0.670280] ------------[ cut here ]------------ 
>> > [    0.670288] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:1054 rcu_irq_enter+0x7e/0xa8 
>> > [    0.670293] Modules linked in: 
>> > [    0.670299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W         5.10.0-rc6 #2263 
>> > [    0.670304] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0) 
>> > [    0.670309] Krnl PSW : 0404d00180000000 0000000000d8a8da (rcu_irq_enter+0x82/0xa8) 
>> > [    0.670318]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 
>> > [    0.670325] Krnl GPRS: 0000000000000000 0000000080000002 0000000000000001 000000000101fcee 
>> > [    0.670331]            0000000000000000 0000000000000000 0000000000000000 0000000000000000 
>> > [    0.670337]            000003e00029ff48 0000000000000000 00000000017212d8 0000000000000001 
>> > [    0.670343]            0000000005ba0100 00000000000324bb 000003e00029fe40 000003e00029fe10
>> >  
>> > [    0.670358] Krnl Code: 0000000000d8a8ca: ec180013017e        cij     %r1,1,8,0000000000d8a8f0 
>> > [    0.670358]            0000000000d8a8d0: ecb80005007e        cij     %r11,0,8,0000000000d8a8da 
>> > [    0.670358]           #0000000000d8a8d6: af000000            mc      0,0 
>> > [    0.670358]           >0000000000d8a8da: ebbff0a00004        lmg     %r11,%r15,160(%r15) 
>> > [    0.670358]            0000000000d8a8e0: c0f4ffffff68        brcl    15,0000000000d8a7b0 
>> > [    0.670358]            0000000000d8a8e6: c0e5000038c1        brasl   %r14,0000000000d91a68 
>> > [    0.670358]            0000000000d8a8ec: a7f4ffdc            brc     15,0000000000d8a8a4 
>> > [    0.670358]            0000000000d8a8f0: c0e5000038bc        brasl   %r14,0000000000d91a68 
>> > [    0.670392] Call Trace: 
>> > [    0.670396]  [<0000000000d8a8da>] rcu_irq_enter+0x82/0xa8  
>> > [    0.670401]  [<0000000000157f9a>] irq_enter+0x22/0x30  
>> > [    0.670404]  [<000000000010e51c>] do_IRQ+0x64/0xd0  
>> > [    0.670408]  [<0000000000d9a65a>] ext_int_handler+0x18e/0x194  
>> > [    0.670412]  [<0000000000d9a6a0>] psw_idle+0x40/0x48  
>> > [    0.670416] ([<0000000000104202>] enabled_wait+0x22/0xf0) 
>> > [    0.670419]  [<00000000001046e2>] arch_cpu_idle+0x22/0x38  
>> > [    0.670423]  [<0000000000d986cc>] default_idle_call+0x74/0xd8  
>> > [    0.670427]  [<000000000019a94a>] do_idle+0xf2/0x1b0  
>> > [    0.670431]  [<000000000019ac7e>] cpu_startup_entry+0x36/0x40  
>> > [    0.670435]  [<0000000000118b9a>] smp_start_secondary+0x82/0x88  
>> 
>> But but but...
>> 
>>   do_idle()			# IRQs on
>>     local_irq_disable();	# IRQs off
>>     defaul_idle_call()	# IRQs off
> 	lockdep_hardirqs_on();	# IRQs off, but lockdep things they're on
>>       arch_cpu_idle()		# IRQs off
>>         enabled_wait()	# IRQs off
>> 	  raw_local_save()	# still off
>> 	  psw_idle()		# very much off
>> 	    ext_int_handler	# get an interrupt ?!?!
> 	      rcu_irq_enter()	# lockdep thinks IRQs are on <- FAIL
>
>
> I can't much read s390 assembler, but ext_int_handler() has a
> TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> with the actual state, but there's some condition before it, what's that
> test and is that right?

That test was introduced to only track changes in IRQ state because of
recursion problems in lockdep. This now seems to no longer work. We
propably could remove that as lockdep now can handle recursion much
better with all the recent changes, but given that we're already at
-rc6, i don't want to touch entry.S code because of a debugging feature.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-01  7:39         ` Peter Zijlstra
@ 2020-12-01  7:56           ` Peter Zijlstra
  2020-12-01 19:45             ` Linus Torvalds
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2020-12-01  7:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Paul E. McKenney, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Tue, Dec 01, 2020 at 08:39:06AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 30, 2020 at 09:55:44AM -0800, Linus Torvalds wrote:
> 
> > Yes, yes, I can very well imagine some hardware doing a "idle until
> > you sense an interrupt, but don't actually take it". It's not
> > _impossible_. But it's certainly not normal.
> 
> A lot of hardware actually does that, including modern x86. But yes, not
> nearly everything.

So the thing is that (with exception of RCU-tiny), we need interrupts
disabled when coming out of idle to prod RCU back into action.

So even if an architecture needs to enable interrupts on idle, we need
it disabled again when coming out. So we might as well have the arch
idle routine then be: STI; HLT; CLI; because then architectures than can
idle with interrupts disabled can avoid mucking about with the interrupt
state entirely.

Currently they (and this includes much of ARM) are effectively doing:

	wfi();
	raw_local_irq_enable();
	raw_local_irq_disable();

where a plain: wfi(), would be sufficient.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-30 19:31                 ` Christian Borntraeger
@ 2020-12-01  8:07                   ` Peter Zijlstra
  2020-12-01 11:07                     ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2020-12-01  8:07 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Linus Torvalds, Sven Schnelle, Thomas Gleixner, Paul E. McKenney,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Mon, Nov 30, 2020 at 08:31:32PM +0100, Christian Borntraeger wrote:
> On 30.11.20 19:04, Linus Torvalds wrote:
> > On Mon, Nov 30, 2020 at 5:03 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >>
> >>> But but but...
> >>>
> >>>   do_idle()                   # IRQs on
> >>>     local_irq_disable();      # IRQs off
> >>>     defaul_idle_call()        # IRQs off
> >>         lockdep_hardirqs_on();  # IRQs off, but lockdep things they're on
> >>>       arch_cpu_idle()         # IRQs off
> >>>         enabled_wait()        # IRQs off
> >>>         raw_local_save()      # still off
> >>>         psw_idle()            # very much off
> >>>           ext_int_handler     # get an interrupt ?!?!
> >>               rcu_irq_enter()   # lockdep thinks IRQs are on <- FAIL
> >>
> >> I can't much read s390 assembler, but ext_int_handler() has a
> >> TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> >> with the actual state, but there's some condition before it, what's that
> >> test and is that right?
> > 
> > I think that "psw_idle()" enables interrupts, exactly like x86 does.

(like ye olde x86, modern x86 idles with interrupts disabled)

> Yes, by definition.  Otherwise it would be an software error state.
> The interesting part is the lpswe instruction at the end (load PSW) 
> which loads the full PSW, which contains interrupt enablement, wait bit,
> condition code, paging enablement, machine check enablement the address
> and others. The idle psw is enabled for interrupts and has the wait bit
> set. If the wait bit is set and interrupts are off this is called "disabled
> wait" and is used for panic, shutdown etc. 

OK, but at that point, hardware interrupt state is on, lockdep thinks
it's on. And we take an interrupt, just like any old regular interrupt
enabled region.

But then the exception handler (ext_int_handler), which I'm assuming is
ran by the hardware with hardware interrupts disabled again, should be
calling into lockdep to tell interrupts were disabled. IOW that
TRACE_IRQS_OFF bit in there.

But that doesn't seem to be working right. Why? Because afaict this is
then the exact normal flow of things, but it's only going sideways
during this idle thing.

What's going 'funny' ?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-01  8:07                   ` Peter Zijlstra
@ 2020-12-01 11:07                     ` Peter Zijlstra
  2020-12-01 14:46                       ` Paul E. McKenney
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2020-12-01 11:07 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Linus Torvalds, Sven Schnelle, Thomas Gleixner, Paul E. McKenney,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Tue, Dec 01, 2020 at 09:07:34AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 30, 2020 at 08:31:32PM +0100, Christian Borntraeger wrote:
> > On 30.11.20 19:04, Linus Torvalds wrote:
> > > On Mon, Nov 30, 2020 at 5:03 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >>
> > >>> But but but...
> > >>>
> > >>>   do_idle()                   # IRQs on
> > >>>     local_irq_disable();      # IRQs off
> > >>>     defaul_idle_call()        # IRQs off
> > >>         lockdep_hardirqs_on();  # IRQs off, but lockdep things they're on
> > >>>       arch_cpu_idle()         # IRQs off
> > >>>         enabled_wait()        # IRQs off
> > >>>         raw_local_save()      # still off
> > >>>         psw_idle()            # very much off
> > >>>           ext_int_handler     # get an interrupt ?!?!
> > >>               rcu_irq_enter()   # lockdep thinks IRQs are on <- FAIL
> > >>
> > >> I can't much read s390 assembler, but ext_int_handler() has a
> > >> TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> > >> with the actual state, but there's some condition before it, what's that
> > >> test and is that right?
> > > 
> > > I think that "psw_idle()" enables interrupts, exactly like x86 does.
> 
> (like ye olde x86, modern x86 idles with interrupts disabled)
> 
> > Yes, by definition.  Otherwise it would be an software error state.
> > The interesting part is the lpswe instruction at the end (load PSW) 
> > which loads the full PSW, which contains interrupt enablement, wait bit,
> > condition code, paging enablement, machine check enablement the address
> > and others. The idle psw is enabled for interrupts and has the wait bit
> > set. If the wait bit is set and interrupts are off this is called "disabled
> > wait" and is used for panic, shutdown etc. 
> 
> OK, but at that point, hardware interrupt state is on, lockdep thinks
> it's on. And we take an interrupt, just like any old regular interrupt
> enabled region.
> 
> But then the exception handler (ext_int_handler), which I'm assuming is
> ran by the hardware with hardware interrupts disabled again, should be
> calling into lockdep to tell interrupts were disabled. IOW that
> TRACE_IRQS_OFF bit in there.
> 
> But that doesn't seem to be working right. Why? Because afaict this is
> then the exact normal flow of things, but it's only going sideways
> during this idle thing.
> 
> What's going 'funny' ?

So after having talked to Sven a bit, the thing that is happening, is
that this is the one place where we take interrupts with RCU being
disabled. Normally RCU is watching and all is well, except during idle.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-01 11:07                     ` Peter Zijlstra
@ 2020-12-01 14:46                       ` Paul E. McKenney
  2020-12-01 14:55                         ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Paul E. McKenney @ 2020-12-01 14:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian Borntraeger, Linus Torvalds, Sven Schnelle,
	Thomas Gleixner, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Tue, Dec 01, 2020 at 12:07:24PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 09:07:34AM +0100, Peter Zijlstra wrote:
> > On Mon, Nov 30, 2020 at 08:31:32PM +0100, Christian Borntraeger wrote:
> > > On 30.11.20 19:04, Linus Torvalds wrote:
> > > > On Mon, Nov 30, 2020 at 5:03 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >>
> > > >>> But but but...
> > > >>>
> > > >>>   do_idle()                   # IRQs on
> > > >>>     local_irq_disable();      # IRQs off
> > > >>>     defaul_idle_call()        # IRQs off
> > > >>         lockdep_hardirqs_on();  # IRQs off, but lockdep things they're on
> > > >>>       arch_cpu_idle()         # IRQs off
> > > >>>         enabled_wait()        # IRQs off
> > > >>>         raw_local_save()      # still off
> > > >>>         psw_idle()            # very much off
> > > >>>           ext_int_handler     # get an interrupt ?!?!
> > > >>               rcu_irq_enter()   # lockdep thinks IRQs are on <- FAIL
> > > >>
> > > >> I can't much read s390 assembler, but ext_int_handler() has a
> > > >> TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> > > >> with the actual state, but there's some condition before it, what's that
> > > >> test and is that right?
> > > > 
> > > > I think that "psw_idle()" enables interrupts, exactly like x86 does.
> > 
> > (like ye olde x86, modern x86 idles with interrupts disabled)
> > 
> > > Yes, by definition.  Otherwise it would be an software error state.
> > > The interesting part is the lpswe instruction at the end (load PSW) 
> > > which loads the full PSW, which contains interrupt enablement, wait bit,
> > > condition code, paging enablement, machine check enablement the address
> > > and others. The idle psw is enabled for interrupts and has the wait bit
> > > set. If the wait bit is set and interrupts are off this is called "disabled
> > > wait" and is used for panic, shutdown etc. 
> > 
> > OK, but at that point, hardware interrupt state is on, lockdep thinks
> > it's on. And we take an interrupt, just like any old regular interrupt
> > enabled region.
> > 
> > But then the exception handler (ext_int_handler), which I'm assuming is
> > ran by the hardware with hardware interrupts disabled again, should be
> > calling into lockdep to tell interrupts were disabled. IOW that
> > TRACE_IRQS_OFF bit in there.
> > 
> > But that doesn't seem to be working right. Why? Because afaict this is
> > then the exact normal flow of things, but it's only going sideways
> > during this idle thing.
> > 
> > What's going 'funny' ?
> 
> So after having talked to Sven a bit, the thing that is happening, is
> that this is the one place where we take interrupts with RCU being
> disabled. Normally RCU is watching and all is well, except during idle.

Isn't interrupt entry supposed to invoke rcu_irq_enter() at some point?
Or did this fall victim to recent optimizations?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-01 14:46                       ` Paul E. McKenney
@ 2020-12-01 14:55                         ` Peter Zijlstra
  2020-12-01 16:53                           ` Paul E. McKenney
  2020-12-01 18:15                           ` Peter Zijlstra
  0 siblings, 2 replies; 38+ messages in thread
From: Peter Zijlstra @ 2020-12-01 14:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Christian Borntraeger, Linus Torvalds, Sven Schnelle,
	Thomas Gleixner, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Tue, Dec 01, 2020 at 06:46:44AM -0800, Paul E. McKenney wrote:

> > So after having talked to Sven a bit, the thing that is happening, is
> > that this is the one place where we take interrupts with RCU being
> > disabled. Normally RCU is watching and all is well, except during idle.
> 
> Isn't interrupt entry supposed to invoke rcu_irq_enter() at some point?
> Or did this fall victim to recent optimizations?

It does, but the problem is that s390 is still using
trace_hardirqs_off(), which calls into tracing before RCU is enabled.

The entry order between lockdep, rcu and tracing is critical.

You can't call into tracing without RCU running,
you can't call into RCU without lockdep setup,
you can't call the (old) lockdep setup without landing in tracing.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-01 14:55                         ` Peter Zijlstra
@ 2020-12-01 16:53                           ` Paul E. McKenney
  2020-12-01 18:15                           ` Peter Zijlstra
  1 sibling, 0 replies; 38+ messages in thread
From: Paul E. McKenney @ 2020-12-01 16:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian Borntraeger, Linus Torvalds, Sven Schnelle,
	Thomas Gleixner, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Tue, Dec 01, 2020 at 03:55:19PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 06:46:44AM -0800, Paul E. McKenney wrote:
> 
> > > So after having talked to Sven a bit, the thing that is happening, is
> > > that this is the one place where we take interrupts with RCU being
> > > disabled. Normally RCU is watching and all is well, except during idle.
> > 
> > Isn't interrupt entry supposed to invoke rcu_irq_enter() at some point?
> > Or did this fall victim to recent optimizations?
> 
> It does, but the problem is that s390 is still using
> trace_hardirqs_off(), which calls into tracing before RCU is enabled.
> 
> The entry order between lockdep, rcu and tracing is critical.
> 
> You can't call into tracing without RCU running,
> you can't call into RCU without lockdep setup,
> you can't call the (old) lockdep setup without landing in tracing.

Whew!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-01 14:55                         ` Peter Zijlstra
  2020-12-01 16:53                           ` Paul E. McKenney
@ 2020-12-01 18:15                           ` Peter Zijlstra
  2020-12-01 18:48                             ` Peter Zijlstra
  2020-12-01 18:57                             ` Mark Rutland
  1 sibling, 2 replies; 38+ messages in thread
From: Peter Zijlstra @ 2020-12-01 18:15 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Christian Borntraeger, Linus Torvalds, Sven Schnelle,
	Thomas Gleixner, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Tue, Dec 01, 2020 at 03:55:19PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 06:46:44AM -0800, Paul E. McKenney wrote:
> 
> > > So after having talked to Sven a bit, the thing that is happening, is
> > > that this is the one place where we take interrupts with RCU being
> > > disabled. Normally RCU is watching and all is well, except during idle.
> > 
> > Isn't interrupt entry supposed to invoke rcu_irq_enter() at some point?
> > Or did this fall victim to recent optimizations?
> 
> It does, but the problem is that s390 is still using

I might've been too quick there, I can't actually seem to find where
s390 does rcu_irq_enter()/exit().

Also, I'm thinking the below might just about solve the current problem.
The next problem would then be it calling TRACE_IRQS_ON after it did
rcu_irq_exit()... :/


---
diff --git a/arch/s390/include/asm/irq.h b/arch/s390/include/asm/irq.h
index 9f75d67b8c20..24d3dd482df7 100644
--- a/arch/s390/include/asm/irq.h
+++ b/arch/s390/include/asm/irq.h
@@ -113,6 +113,8 @@ void irq_subclass_unregister(enum irq_subclass subclass);
 
 #define irq_canonicalize(irq)  (irq)
 
+extern __visible void ext_do_IRQ(struct pt_regs *regs, unsigned long vector);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_IRQ_H */
diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 26bb0603c5a1..b8e89b685038 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -976,16 +976,10 @@ ENTRY(ext_int_handler)
 	xc	__PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
 	TSTMSK	__LC_CPU_FLAGS,_CIF_IGNORE_IRQ
 	jo	.Lio_restore
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-	tmhh	%r8,0x300
-	jz	1f
-	TRACE_IRQS_OFF
-1:
-#endif
 	xc	__SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
 	lgr	%r2,%r11		# pass pointer to pt_regs
 	lghi	%r3,EXT_INTERRUPT
-	brasl	%r14,do_IRQ
+	brasl	%r14,ext_do_IRQ
 	j	.Lio_return
 ENDPROC(ext_int_handler)
 
diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c
index 3514420f0259..f4a29114e9fd 100644
--- a/arch/s390/kernel/irq.c
+++ b/arch/s390/kernel/irq.c
@@ -329,3 +329,23 @@ void irq_subclass_unregister(enum irq_subclass subclass)
 	spin_unlock(&irq_subclass_lock);
 }
 EXPORT_SYMBOL(irq_subclass_unregister);
+
+noinstr void ext_do_IRQ(struct pt_regs *regs, unsigned long vector)
+{
+	bool rcu = false;
+
+	lockdep_hardirqs_off(CALLER_ADDR0);
+	if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) {
+		rcu_irq_enter();
+		rcu = true;
+	}
+	/* instrumentation_begin(); */
+
+	trace_hardirqs_off_finish();
+
+	do_IRQ(regs, vector);
+
+	/* instrumentation_end(); */
+	if (rcu)
+		rcu_irq_exit();
+}

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-01 18:15                           ` Peter Zijlstra
@ 2020-12-01 18:48                             ` Peter Zijlstra
  2020-12-01 18:57                             ` Mark Rutland
  1 sibling, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2020-12-01 18:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Christian Borntraeger, Linus Torvalds, Sven Schnelle,
	Thomas Gleixner, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Tue, Dec 01, 2020 at 07:15:06PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 03:55:19PM +0100, Peter Zijlstra wrote:
> > On Tue, Dec 01, 2020 at 06:46:44AM -0800, Paul E. McKenney wrote:
> > 
> > > > So after having talked to Sven a bit, the thing that is happening, is
> > > > that this is the one place where we take interrupts with RCU being
> > > > disabled. Normally RCU is watching and all is well, except during idle.
> > > 
> > > Isn't interrupt entry supposed to invoke rcu_irq_enter() at some point?
> > > Or did this fall victim to recent optimizations?
> > 
> > It does, but the problem is that s390 is still using
> 
> I might've been too quick there, I can't actually seem to find where
> s390 does rcu_irq_enter()/exit().

Argh, do_IRQ is per arch.. and that does irq_enter() which then does the
deed (thanks Sven!).

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-01 18:15                           ` Peter Zijlstra
  2020-12-01 18:48                             ` Peter Zijlstra
@ 2020-12-01 18:57                             ` Mark Rutland
  2020-12-01 19:14                               ` Peter Zijlstra
  1 sibling, 1 reply; 38+ messages in thread
From: Mark Rutland @ 2020-12-01 18:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Christian Borntraeger, Linus Torvalds,
	Sven Schnelle, Thomas Gleixner, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Tue, Dec 01, 2020 at 07:15:06PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 03:55:19PM +0100, Peter Zijlstra wrote:
> > On Tue, Dec 01, 2020 at 06:46:44AM -0800, Paul E. McKenney wrote:
> > 
> > > > So after having talked to Sven a bit, the thing that is happening, is
> > > > that this is the one place where we take interrupts with RCU being
> > > > disabled. Normally RCU is watching and all is well, except during idle.
> > > 
> > > Isn't interrupt entry supposed to invoke rcu_irq_enter() at some point?
> > > Or did this fall victim to recent optimizations?
> > 
> > It does, but the problem is that s390 is still using
> 
> I might've been too quick there, I can't actually seem to find where
> s390 does rcu_irq_enter()/exit().
> 
> Also, I'm thinking the below might just about solve the current problem.
> The next problem would then be it calling TRACE_IRQS_ON after it did
> rcu_irq_exit()... :/

I gave this patch a go under QEMU TCG atop v5.10-rc6 s390 defconfig with
PROVE_LOCKING and DEBUG_ATOMIC_SLEEP. It significantly reduces the
number of lockdep splats, but IIUC we need to handle the io_int_handler
path in addition to the ext_int_handler path, and there's a remaining
lockdep splat (below).

If this ends up looking like we'll need more point-fixes, I wonder if we
should conditionalise the new behaviour of the core idle code under a
new CONFIG symbol for now, and opt-in x86 and arm64, then transition the
rest once they've had a chance to test. They'll still be broken in the
mean time, but no more so than they previously were.

Thanks,
Mark.

[    3.870912] ------------[ cut here ]------------
[    3.871225] DEBUG_LOCKS_WARN_ON(current->hardirq_chain_key != current->curr_chain_key)
[    3.872025] WARNING: CPU: 1 PID: 0 at kernel/locking/lockdep.c:4155 lockdep_hardirqs_on+0x150/0x158
[    3.872249] Modules linked in:
[    3.873096] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.10.0-rc6-dirty #9
[    3.873262] Hardware name: QEMU 2964 QEMU (KVM/Linux)
[    3.873584] Krnl PSW : 0404e00180000000 0000000000ce51ac (lockdep_hardirqs_on+0x154/0x158)
[    3.873967]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[    3.874214] Krnl GPRS: c0000000ffffefff 00000000011e5218 000000000000004a 00000000001d988a
[    3.874363]            00000000ffffffea 000003e000000001 0000000000000000 0000000000000000
[    3.874511]            0000000000000000 000000000141e5f0 0000000003278000 0000000000cf354a
[    3.874658]            0000000000000000 0000000000000000 0000000000ce51a8 000003e000073db0
[    3.876258] Krnl Code: 0000000000ce519c: c0200012432c        larl    %r2,0000000000f2d7f4
[    3.876258]            0000000000ce51a2: c0e5ffff83db        brasl   %r14,0000000000cd5958
[    3.876258]           #0000000000ce51a8: af000000            mc      0,0
[    3.876258]           >0000000000ce51ac: a7f4ffb3            brc     15,0000000000ce5112
[    3.876258]            0000000000ce51b0: ebeff0880024        stmg    %r14,%r15,136(%r15)
[    3.876258]            0000000000ce51b6: b90400ef            lgr     %r14,%r15
[    3.876258]            0000000000ce51ba: e3f0ffe0ff71        lay     %r15,-32(%r15)
[    3.876258]            0000000000ce51c0: e3e0f0980024        stg     %r14,152(%r15)
[    3.877823] Call Trace:
[    3.878073]  [<0000000000ce51ac>] lockdep_hardirqs_on+0x154/0x158 
[    3.878260] ([<0000000000ce51a8>] lockdep_hardirqs_on+0x150/0x158)
[    3.878406]  [<0000000000cf3596>] default_idle_call+0xb6/0x2a8 
[    3.878549]  [<0000000000194f60>] do_idle+0xe0/0x190 
[    3.878682]  [<00000000001952ae>] cpu_startup_entry+0x36/0x40 
[    3.878816]  [<0000000000118152>] smp_start_secondary+0x82/0x88 
[    3.878975] INFO: lockdep is turned off.
[    3.879119] Last Breaking-Event-Address:
[    3.879273]  [<0000000000cf76d0>] __s390_indirect_jump_r14+0x0/0xc
[    3.879456] irq event stamp: 104
[    3.879619] hardirqs last  enabled at (102): [<00000000001a540c>] load_balance+0x28c/0x990
[    3.879783] hardirqs last disabled at (103): [<0000000000cf7626>] __do_softirq+0x536/0x5c8
[    3.879945] softirqs last  enabled at (104): [<0000000000cf7584>] __do_softirq+0x494/0x5c8
[    3.880114] softirqs last disabled at (95): [<000000000010df60>] do_softirq_own_stack+0x70/0x80
[    3.880724] random: get_random_bytes called from __warn+0x12a/0x160 with crng_init=0
[    3.882060] ---[ end trace ffae6482e242107b ]---

> ---
> diff --git a/arch/s390/include/asm/irq.h b/arch/s390/include/asm/irq.h
> index 9f75d67b8c20..24d3dd482df7 100644
> --- a/arch/s390/include/asm/irq.h
> +++ b/arch/s390/include/asm/irq.h
> @@ -113,6 +113,8 @@ void irq_subclass_unregister(enum irq_subclass subclass);
>  
>  #define irq_canonicalize(irq)  (irq)
>  
> +extern __visible void ext_do_IRQ(struct pt_regs *regs, unsigned long vector);
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* _ASM_IRQ_H */
> diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
> index 26bb0603c5a1..b8e89b685038 100644
> --- a/arch/s390/kernel/entry.S
> +++ b/arch/s390/kernel/entry.S
> @@ -976,16 +976,10 @@ ENTRY(ext_int_handler)
>  	xc	__PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
>  	TSTMSK	__LC_CPU_FLAGS,_CIF_IGNORE_IRQ
>  	jo	.Lio_restore
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> -	tmhh	%r8,0x300
> 	jz	1f
> -	TRACE_IRQS_OFF
> -1:
> -#endif
>  	xc	__SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
>  	lgr	%r2,%r11		# pass pointer to pt_regs
>  	lghi	%r3,EXT_INTERRUPT
> -	brasl	%r14,do_IRQ
> +	brasl	%r14,ext_do_IRQ
>  	j	.Lio_return
>  ENDPROC(ext_int_handler)
>  
> diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c
> index 3514420f0259..f4a29114e9fd 100644
> --- a/arch/s390/kernel/irq.c
> +++ b/arch/s390/kernel/irq.c
> @@ -329,3 +329,23 @@ void irq_subclass_unregister(enum irq_subclass subclass)
>  	spin_unlock(&irq_subclass_lock);
>  }
>  EXPORT_SYMBOL(irq_subclass_unregister);
> +
> +noinstr void ext_do_IRQ(struct pt_regs *regs, unsigned long vector)
> +{
> +	bool rcu = false;
> +
> +	lockdep_hardirqs_off(CALLER_ADDR0);
> +	if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) {
> +		rcu_irq_enter();
> +		rcu = true;
> +	}
> +	/* instrumentation_begin(); */
> +
> +	trace_hardirqs_off_finish();
> +
> +	do_IRQ(regs, vector);
> +
> +	/* instrumentation_end(); */
> +	if (rcu)
> +		rcu_irq_exit();
> +}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-01 18:57                             ` Mark Rutland
@ 2020-12-01 19:14                               ` Peter Zijlstra
  2020-12-01 19:18                                 ` Heiko Carstens
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2020-12-01 19:14 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Paul E. McKenney, Christian Borntraeger, Linus Torvalds,
	Sven Schnelle, Thomas Gleixner, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Tue, Dec 01, 2020 at 06:57:37PM +0000, Mark Rutland wrote:
> On Tue, Dec 01, 2020 at 07:15:06PM +0100, Peter Zijlstra wrote:
> > On Tue, Dec 01, 2020 at 03:55:19PM +0100, Peter Zijlstra wrote:
> > > On Tue, Dec 01, 2020 at 06:46:44AM -0800, Paul E. McKenney wrote:
> > > 
> > > > > So after having talked to Sven a bit, the thing that is happening, is
> > > > > that this is the one place where we take interrupts with RCU being
> > > > > disabled. Normally RCU is watching and all is well, except during idle.
> > > > 
> > > > Isn't interrupt entry supposed to invoke rcu_irq_enter() at some point?
> > > > Or did this fall victim to recent optimizations?
> > > 
> > > It does, but the problem is that s390 is still using
> > 
> > I might've been too quick there, I can't actually seem to find where
> > s390 does rcu_irq_enter()/exit().
> > 
> > Also, I'm thinking the below might just about solve the current problem.
> > The next problem would then be it calling TRACE_IRQS_ON after it did
> > rcu_irq_exit()... :/
> 
> I gave this patch a go under QEMU TCG atop v5.10-rc6 s390 defconfig with
> PROVE_LOCKING and DEBUG_ATOMIC_SLEEP. It significantly reduces the
> number of lockdep splats, but IIUC we need to handle the io_int_handler
> path in addition to the ext_int_handler path, and there's a remaining
> lockdep splat (below).

I'm amazed it didn't actually make things worse, given how I failed to
spot do_IRQ() was arch code etc..

> If this ends up looking like we'll need more point-fixes, I wonder if we
> should conditionalise the new behaviour of the core idle code under a
> new CONFIG symbol for now, and opt-in x86 and arm64, then transition the
> rest once they've had a chance to test. They'll still be broken in the
> mean time, but no more so than they previously were.

We can do that I suppose... :/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-01 19:14                               ` Peter Zijlstra
@ 2020-12-01 19:18                                 ` Heiko Carstens
  2020-12-02  9:21                                   ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Heiko Carstens @ 2020-12-01 19:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mark Rutland, Paul E. McKenney, Christian Borntraeger,
	Linus Torvalds, Sven Schnelle, Thomas Gleixner,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Tue, Dec 01, 2020 at 08:14:41PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 06:57:37PM +0000, Mark Rutland wrote:
> > On Tue, Dec 01, 2020 at 07:15:06PM +0100, Peter Zijlstra wrote:
> > > On Tue, Dec 01, 2020 at 03:55:19PM +0100, Peter Zijlstra wrote:
> > > > On Tue, Dec 01, 2020 at 06:46:44AM -0800, Paul E. McKenney wrote:
> > > > 
> > > > > > So after having talked to Sven a bit, the thing that is happening, is
> > > > > > that this is the one place where we take interrupts with RCU being
> > > > > > disabled. Normally RCU is watching and all is well, except during idle.
> > > > > 
> > > > > Isn't interrupt entry supposed to invoke rcu_irq_enter() at some point?
> > > > > Or did this fall victim to recent optimizations?
> > > > 
> > > > It does, but the problem is that s390 is still using
> > > 
> > > I might've been too quick there, I can't actually seem to find where
> > > s390 does rcu_irq_enter()/exit().
> > > 
> > > Also, I'm thinking the below might just about solve the current problem.
> > > The next problem would then be it calling TRACE_IRQS_ON after it did
> > > rcu_irq_exit()... :/
> > 
> > I gave this patch a go under QEMU TCG atop v5.10-rc6 s390 defconfig with
> > PROVE_LOCKING and DEBUG_ATOMIC_SLEEP. It significantly reduces the
> > number of lockdep splats, but IIUC we need to handle the io_int_handler
> > path in addition to the ext_int_handler path, and there's a remaining
> > lockdep splat (below).
> 
> I'm amazed it didn't actually make things worse, given how I failed to
> spot do_IRQ() was arch code etc..
> 
> > If this ends up looking like we'll need more point-fixes, I wonder if we
> > should conditionalise the new behaviour of the core idle code under a
> > new CONFIG symbol for now, and opt-in x86 and arm64, then transition the
> > rest once they've had a chance to test. They'll still be broken in the
> > mean time, but no more so than they previously were.
> 
> We can do that I suppose... :/

Well, the following small patch works for me (plus an additional call to
trace_hardirqs_on() in our udelay implementation - but that's probably
independent).
Is there a reason why this should be considered broken?

diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 26bb0603c5a1..92beb1444644 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -763,12 +763,7 @@ ENTRY(io_int_handler)
 	xc	__PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
 	TSTMSK	__LC_CPU_FLAGS,_CIF_IGNORE_IRQ
 	jo	.Lio_restore
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-	tmhh	%r8,0x300
-	jz	1f
 	TRACE_IRQS_OFF
-1:
-#endif
 	xc	__SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
 .Lio_loop:
 	lgr	%r2,%r11		# pass pointer to pt_regs
@@ -791,12 +786,7 @@ ENTRY(io_int_handler)
 	TSTMSK	__LC_CPU_FLAGS,_CIF_WORK
 	jnz	.Lio_work
 .Lio_restore:
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-	tm	__PT_PSW(%r11),3
-	jno	0f
 	TRACE_IRQS_ON
-0:
-#endif
 	mvc	__LC_RETURN_PSW(16),__PT_PSW(%r11)
 	tm	__PT_PSW+1(%r11),0x01	# returning to user ?
 	jno	.Lio_exit_kernel
@@ -976,12 +966,7 @@ ENTRY(ext_int_handler)
 	xc	__PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
 	TSTMSK	__LC_CPU_FLAGS,_CIF_IGNORE_IRQ
 	jo	.Lio_restore
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-	tmhh	%r8,0x300
-	jz	1f
 	TRACE_IRQS_OFF
-1:
-#endif
 	xc	__SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
 	lgr	%r2,%r11		# pass pointer to pt_regs
 	lghi	%r3,EXT_INTERRUPT
diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c
index 2b85096964f8..5bd8c1044d09 100644
--- a/arch/s390/kernel/idle.c
+++ b/arch/s390/kernel/idle.c
@@ -123,7 +123,6 @@ void arch_cpu_idle_enter(void)
 void arch_cpu_idle(void)
 {
 	enabled_wait();
-	raw_local_irq_enable();
 }
 
 void arch_cpu_idle_exit(void)

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-01  7:56           ` Peter Zijlstra
@ 2020-12-01 19:45             ` Linus Torvalds
  2020-12-02  7:53               ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2020-12-01 19:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Paul E. McKenney, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Mon, Nov 30, 2020 at 11:56 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> So even if an architecture needs to enable interrupts on idle, we need
> it disabled again when coming out. So we might as well have the arch
> idle routine then be: STI; HLT; CLI; because then architectures than can
> idle with interrupts disabled can avoid mucking about with the interrupt
> state entirely.

But that's not what the code is doing.

Go look at it.

It does sti;hlt;cli;pushf;cli;sti.

All for no good reason - because the code is structured so that even
if all the tracking and lockdep is disabled, the pointless "let's
protect the tracking from interrupts" is still there.

See what I am complaining about?

                Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-01 19:45             ` Linus Torvalds
@ 2020-12-02  7:53               ` Peter Zijlstra
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2020-12-02  7:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Paul E. McKenney, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Tue, Dec 01, 2020 at 11:45:25AM -0800, Linus Torvalds wrote:
> On Mon, Nov 30, 2020 at 11:56 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > So even if an architecture needs to enable interrupts on idle, we need
> > it disabled again when coming out. So we might as well have the arch
> > idle routine then be: STI; HLT; CLI; because then architectures than can
> > idle with interrupts disabled can avoid mucking about with the interrupt
> > state entirely.
> 
> But that's not what the code is doing.
> 
> Go look at it.
> 
> It does sti;hlt;cli;pushf;cli;sti.
> 
> All for no good reason - because the code is structured so that even
> if all the tracking and lockdep is disabled, the pointless "let's
> protect the tracking from interrupts" is still there.
> 
> See what I am complaining about?

Absolutely.

  default_idle()
    arch_cpu_idle()
      sti; hlt;
    cli;
    rcu_idle_exit()
      pushf;
      cli;
      rcu_eqs_exit(false);
      popf;
    sti;

is what it currently looks like, and that's completely insane, no
argument.

What I would like to end up with is:

  default_idle()
    arch_cpu_idle()
      sti; hlt; cli
    rcu_idle_exit()
      rcu_eqs_exit(false);
    sti;

Which would allow architectures that can idle with IRQs disabled to do
so. But that needs a little more work:

 - make arch_cpu_idle() IRQ invariant (we enter and exit with IRQs off)
 - make cpuidle drivers do similar
 - audit all rcu_idle_exit() callers and remove save/restore


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-11-30 13:03             ` Peter Zijlstra
  2020-11-30 18:04               ` Linus Torvalds
  2020-12-01  7:48               ` Sven Schnelle
@ 2020-12-02  7:54               ` Heiko Carstens
  2020-12-02  9:38                 ` Peter Zijlstra
  2 siblings, 1 reply; 38+ messages in thread
From: Heiko Carstens @ 2020-12-02  7:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sven Schnelle, Linus Torvalds, Thomas Gleixner, Paul E. McKenney,
	Linux Kernel Mailing List, the arch/x86 maintainers

> > But but but...
> > 
> >   do_idle()			# IRQs on
> >     local_irq_disable();	# IRQs off
> >     defaul_idle_call()	# IRQs off
> 	lockdep_hardirqs_on();	# IRQs off, but lockdep things they're on
> >       arch_cpu_idle()		# IRQs off
> >         enabled_wait()	# IRQs off
> > 	  raw_local_save()	# still off
> > 	  psw_idle()		# very much off
> > 	    ext_int_handler	# get an interrupt ?!?!
> 	      rcu_irq_enter()	# lockdep thinks IRQs are on <- FAIL
> 
> I can't much read s390 assembler, but ext_int_handler() has a
> TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> with the actual state, but there's some condition before it, what's that
> test and is that right?

After digging a bit into our asm code: no, it is not right, and only
for psw_idle() it is wrong.

What happens with the current code:

- default_idle_call() calls lockdep_hardirqs_on() before calling into
  arch_cpu_idle()

- our arch_cpu_idle() calls psw_idle() which enables irqs. the irq
  handler will call/use the SWITCH_ASYNC macro which clears the
  interrupt enabled bits in the old program status word (_only_ for
  psw_idle)

- this again causes the interrupt handler to _not_ call TRACE_IRQS_OFF
  and therefore lockdep thinks interrupts are enabled within the
  interrupt handler

So I guess my patch which I sent yesterday evening should fix all that
mess - plus an explicit trace_hardirqs_off() call in our udelay
implementation is required now.

I'll send a proper patch later.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-01 19:18                                 ` Heiko Carstens
@ 2020-12-02  9:21                                   ` Peter Zijlstra
  2020-12-02 10:56                                     ` Heiko Carstens
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2020-12-02  9:21 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Mark Rutland, Paul E. McKenney, Christian Borntraeger,
	Linus Torvalds, Sven Schnelle, Thomas Gleixner,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Tue, Dec 01, 2020 at 08:18:56PM +0100, Heiko Carstens wrote:

> Is there a reason why this should be considered broken?

AFAICT this is good.

> diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
> index 26bb0603c5a1..92beb1444644 100644
> --- a/arch/s390/kernel/entry.S
> +++ b/arch/s390/kernel/entry.S
> @@ -763,12 +763,7 @@ ENTRY(io_int_handler)
>  	xc	__PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
>  	TSTMSK	__LC_CPU_FLAGS,_CIF_IGNORE_IRQ
>  	jo	.Lio_restore
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> -	tmhh	%r8,0x300
> -	jz	1f
>  	TRACE_IRQS_OFF
> -1:
> -#endif
>  	xc	__SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
>  .Lio_loop:
>  	lgr	%r2,%r11		# pass pointer to pt_regs
> @@ -791,12 +786,7 @@ ENTRY(io_int_handler)
>  	TSTMSK	__LC_CPU_FLAGS,_CIF_WORK
>  	jnz	.Lio_work
>  .Lio_restore:
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> -	tm	__PT_PSW(%r11),3
> -	jno	0f
>  	TRACE_IRQS_ON
> -0:
> -#endif
>  	mvc	__LC_RETURN_PSW(16),__PT_PSW(%r11)
>  	tm	__PT_PSW+1(%r11),0x01	# returning to user ?
>  	jno	.Lio_exit_kernel
> @@ -976,12 +966,7 @@ ENTRY(ext_int_handler)
>  	xc	__PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
>  	TSTMSK	__LC_CPU_FLAGS,_CIF_IGNORE_IRQ
>  	jo	.Lio_restore
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> -	tmhh	%r8,0x300
> -	jz	1f
>  	TRACE_IRQS_OFF
> -1:
> -#endif
>  	xc	__SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
>  	lgr	%r2,%r11		# pass pointer to pt_regs
>  	lghi	%r3,EXT_INTERRUPT

OK, so with a little help from s390/PoO and Sven, the code removed skips
the TRACE_IRQS_OFF when IRQs were enabled in the old PSW (the previous
context).

That sounds entirely the right thing. Irrespective of what the previous
IRQ state was, the current state is off.

> diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c
> index 2b85096964f8..5bd8c1044d09 100644
> --- a/arch/s390/kernel/idle.c
> +++ b/arch/s390/kernel/idle.c
> @@ -123,7 +123,6 @@ void arch_cpu_idle_enter(void)
>  void arch_cpu_idle(void)
>  {
>  	enabled_wait();
> -	raw_local_irq_enable();
>  }

Currently arch_cpu_idle() is defined as to return with IRQs enabled,
however, the very first thing we do when we return is
raw_local_irq_disable(), so this change is harmless.

It is also the direction I've been arguing for elsewhere in this thread.
So I'm certainly not complaining.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-02  7:54               ` Heiko Carstens
@ 2020-12-02  9:38                 ` Peter Zijlstra
  2020-12-02 10:52                   ` Heiko Carstens
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2020-12-02  9:38 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Sven Schnelle, Linus Torvalds, Thomas Gleixner, Paul E. McKenney,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Wed, Dec 02, 2020 at 08:54:27AM +0100, Heiko Carstens wrote:
> > > But but but...
> > > 
> > >   do_idle()			# IRQs on
> > >     local_irq_disable();	# IRQs off
> > >     defaul_idle_call()	# IRQs off
> > 	lockdep_hardirqs_on();	# IRQs off, but lockdep things they're on
> > >       arch_cpu_idle()		# IRQs off
> > >         enabled_wait()	# IRQs off
> > > 	  raw_local_save()	# still off
> > > 	  psw_idle()		# very much off
> > > 	    ext_int_handler	# get an interrupt ?!?!
> > 	      rcu_irq_enter()	# lockdep thinks IRQs are on <- FAIL
> > 
> > I can't much read s390 assembler, but ext_int_handler() has a
> > TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> > with the actual state, but there's some condition before it, what's that
> > test and is that right?
> 
> After digging a bit into our asm code: no, it is not right, and only
> for psw_idle() it is wrong.
> 
> What happens with the current code:
> 
> - default_idle_call() calls lockdep_hardirqs_on() before calling into
>   arch_cpu_idle()

Correct.

> - our arch_cpu_idle() calls psw_idle() which enables irqs. the irq
>   handler will call/use the SWITCH_ASYNC macro which clears the
>   interrupt enabled bits in the old program status word (_only_ for
>   psw_idle)

This is the condition at 0: that compares r13 to psw_idle_exit?

> - this again causes the interrupt handler to _not_ call TRACE_IRQS_OFF
>   and therefore lockdep thinks interrupts are enabled within the
>   interrupt handler
> 
> So I guess my patch which I sent yesterday evening should fix all that
> mess

Yes, afaict it does the right thing. Exceptions should call
TRACE_IRQS_OFF unconditionally, since the hardware will disable
interrupts upon taking an exception, irrespective of what the previous
context had.

On exception return the previous IRQ state is inspected and lockdep is
changed to match (except for NMIs, which either are ignored by lockdep
or need a little bit of extra care).

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-02  9:38                 ` Peter Zijlstra
@ 2020-12-02 10:52                   ` Heiko Carstens
  0 siblings, 0 replies; 38+ messages in thread
From: Heiko Carstens @ 2020-12-02 10:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sven Schnelle, Linus Torvalds, Thomas Gleixner, Paul E. McKenney,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Wed, Dec 02, 2020 at 10:38:05AM +0100, Peter Zijlstra wrote:
> On Wed, Dec 02, 2020 at 08:54:27AM +0100, Heiko Carstens wrote:
> > > > But but but...
> > > > 
> > > >   do_idle()			# IRQs on
> > > >     local_irq_disable();	# IRQs off
> > > >     defaul_idle_call()	# IRQs off
> > > 	lockdep_hardirqs_on();	# IRQs off, but lockdep things they're on
> > > >       arch_cpu_idle()		# IRQs off
> > > >         enabled_wait()	# IRQs off
> > > > 	  raw_local_save()	# still off
> > > > 	  psw_idle()		# very much off
> > > > 	    ext_int_handler	# get an interrupt ?!?!
> > > 	      rcu_irq_enter()	# lockdep thinks IRQs are on <- FAIL
> > > 
> > > I can't much read s390 assembler, but ext_int_handler() has a
> > > TRACE_IRQS_OFF, which would be sufficient to re-align the lockdep state
> > > with the actual state, but there's some condition before it, what's that
> > > test and is that right?
> > 
> > After digging a bit into our asm code: no, it is not right, and only
> > for psw_idle() it is wrong.
> > 
> > What happens with the current code:
> > 
> > - default_idle_call() calls lockdep_hardirqs_on() before calling into
> >   arch_cpu_idle()
> 
> Correct.
> 
> > - our arch_cpu_idle() calls psw_idle() which enables irqs. the irq
> >   handler will call/use the SWITCH_ASYNC macro which clears the
> >   interrupt enabled bits in the old program status word (_only_ for
> >   psw_idle)
> 
> This is the condition at 0: that compares r13 to psw_idle_exit?

Yes, exactly.

> > So I guess my patch which I sent yesterday evening should fix all that
> > mess
> 
> Yes, afaict it does the right thing. Exceptions should call
> TRACE_IRQS_OFF unconditionally, since the hardware will disable
> interrupts upon taking an exception, irrespective of what the previous
> context had.
> 
> On exception return the previous IRQ state is inspected and lockdep is
> changed to match (except for NMIs, which either are ignored by lockdep
> or need a little bit of extra care).

Yes, we do all that, except that it seems odd to test the previous
state for interrupts (not exceptions). Since for interrupts the
previous context obviously must have been enabled for interrupts.

Except if you play tricks with the old PSW, like we do :/

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-02  9:21                                   ` Peter Zijlstra
@ 2020-12-02 10:56                                     ` Heiko Carstens
  2020-12-02 11:16                                       ` Mark Rutland
  2020-12-02 12:27                                       ` Peter Zijlstra
  0 siblings, 2 replies; 38+ messages in thread
From: Heiko Carstens @ 2020-12-02 10:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mark Rutland, Paul E. McKenney, Christian Borntraeger,
	Linus Torvalds, Sven Schnelle, Thomas Gleixner,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Wed, Dec 02, 2020 at 10:21:16AM +0100, Peter Zijlstra wrote:
> On Tue, Dec 01, 2020 at 08:18:56PM +0100, Heiko Carstens wrote:
> OK, so with a little help from s390/PoO and Sven, the code removed skips
> the TRACE_IRQS_OFF when IRQs were enabled in the old PSW (the previous
> context).
> 
> That sounds entirely the right thing. Irrespective of what the previous
> IRQ state was, the current state is off.
> 
> > diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c
> > index 2b85096964f8..5bd8c1044d09 100644
> > --- a/arch/s390/kernel/idle.c
> > +++ b/arch/s390/kernel/idle.c
> > @@ -123,7 +123,6 @@ void arch_cpu_idle_enter(void)
> >  void arch_cpu_idle(void)
> >  {
> >  	enabled_wait();
> > -	raw_local_irq_enable();
> >  }
> 
> Currently arch_cpu_idle() is defined as to return with IRQs enabled,
> however, the very first thing we do when we return is
> raw_local_irq_disable(), so this change is harmless.
> 
> It is also the direction I've been arguing for elsewhere in this thread.
> So I'm certainly not complaining.

So I left that raw_local_irq_enable() in to be consistent with other
architectures. enabled_wait() now returns with irqs disabled, but with
a lockdep state that tells irqs are on...  See patch below.
Works and hopefully makes sense ;)

In addition (but not for rc7) I want to get rid of our complex udelay
implementation. I think we don't need that anymore.. so there would be
only the idle code left where we have to play tricks.

From 7bd86fb3eb039a4163281472ca79b9158e726526 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <hca@linux.ibm.com>
Date: Wed, 2 Dec 2020 11:46:01 +0100
Subject: [PATCH] s390: fix irq state tracing

With commit 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs
tracing") common code calls arch_cpu_idle() with a lockdep state that
tells irqs are on.

This doesn't work very well for s390: psw_idle() will enable interrupts
to wait for an interrupt. As soon as an interrupt occurs the interrupt
handler will verify if the old context was psw_idle(). If that is the
case the interrupt enablement bits in the old program status word will
be cleared.

A subsequent test in both the external as well as the io interrupt
handler checks if in the old context interrupts were enabled. Due to
the above patching of the old program status word it is assumed the
old context had interrupts disabled, and therefore a call to
TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in
turn makes lockdep incorrectly "think" that interrupts are enabled
within the interrupt handler.

Fix this by unconditionally calling TRACE_IRQS_OFF when entering
interrupt handlers. Also call unconditionally TRACE_IRQS_ON when
leaving interrupts handlers.

This leaves the special psw_idle() case, which now returns with
interrupts disabled, but has an "irqs on" lockdep state. So callers of
psw_idle() must adjust the state on their own, if required. This is
currently only __udelay_disabled().

Fixes: 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs tracing")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
---
 arch/s390/kernel/entry.S | 15 ---------------
 arch/s390/lib/delay.c    |  5 ++---
 2 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
index 26bb0603c5a1..92beb1444644 100644
--- a/arch/s390/kernel/entry.S
+++ b/arch/s390/kernel/entry.S
@@ -763,12 +763,7 @@ ENTRY(io_int_handler)
 	xc	__PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
 	TSTMSK	__LC_CPU_FLAGS,_CIF_IGNORE_IRQ
 	jo	.Lio_restore
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-	tmhh	%r8,0x300
-	jz	1f
 	TRACE_IRQS_OFF
-1:
-#endif
 	xc	__SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
 .Lio_loop:
 	lgr	%r2,%r11		# pass pointer to pt_regs
@@ -791,12 +786,7 @@ ENTRY(io_int_handler)
 	TSTMSK	__LC_CPU_FLAGS,_CIF_WORK
 	jnz	.Lio_work
 .Lio_restore:
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-	tm	__PT_PSW(%r11),3
-	jno	0f
 	TRACE_IRQS_ON
-0:
-#endif
 	mvc	__LC_RETURN_PSW(16),__PT_PSW(%r11)
 	tm	__PT_PSW+1(%r11),0x01	# returning to user ?
 	jno	.Lio_exit_kernel
@@ -976,12 +966,7 @@ ENTRY(ext_int_handler)
 	xc	__PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
 	TSTMSK	__LC_CPU_FLAGS,_CIF_IGNORE_IRQ
 	jo	.Lio_restore
-#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
-	tmhh	%r8,0x300
-	jz	1f
 	TRACE_IRQS_OFF
-1:
-#endif
 	xc	__SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
 	lgr	%r2,%r11		# pass pointer to pt_regs
 	lghi	%r3,EXT_INTERRUPT
diff --git a/arch/s390/lib/delay.c b/arch/s390/lib/delay.c
index daca7bad66de..8c0c68e7770e 100644
--- a/arch/s390/lib/delay.c
+++ b/arch/s390/lib/delay.c
@@ -33,7 +33,7 @@ EXPORT_SYMBOL(__delay);
 
 static void __udelay_disabled(unsigned long long usecs)
 {
-	unsigned long cr0, cr0_new, psw_mask, flags;
+	unsigned long cr0, cr0_new, psw_mask;
 	struct s390_idle_data idle;
 	u64 end;
 
@@ -45,9 +45,8 @@ static void __udelay_disabled(unsigned long long usecs)
 	psw_mask = __extract_psw() | PSW_MASK_EXT | PSW_MASK_WAIT;
 	set_clock_comparator(end);
 	set_cpu_flag(CIF_IGNORE_IRQ);
-	local_irq_save(flags);
 	psw_idle(&idle, psw_mask);
-	local_irq_restore(flags);
+	trace_hardirqs_off();
 	clear_cpu_flag(CIF_IGNORE_IRQ);
 	set_clock_comparator(S390_lowcore.clock_comparator);
 	__ctl_load(cr0, 0, 0);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-02 10:56                                     ` Heiko Carstens
@ 2020-12-02 11:16                                       ` Mark Rutland
  2020-12-02 13:46                                         ` Heiko Carstens
  2020-12-02 12:27                                       ` Peter Zijlstra
  1 sibling, 1 reply; 38+ messages in thread
From: Mark Rutland @ 2020-12-02 11:16 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Peter Zijlstra, Paul E. McKenney, Christian Borntraeger,
	Linus Torvalds, Sven Schnelle, Thomas Gleixner,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Wed, Dec 02, 2020 at 11:56:49AM +0100, Heiko Carstens wrote:
> From 7bd86fb3eb039a4163281472ca79b9158e726526 Mon Sep 17 00:00:00 2001
> From: Heiko Carstens <hca@linux.ibm.com>
> Date: Wed, 2 Dec 2020 11:46:01 +0100
> Subject: [PATCH] s390: fix irq state tracing
> 
> With commit 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs
> tracing") common code calls arch_cpu_idle() with a lockdep state that
> tells irqs are on.
> 
> This doesn't work very well for s390: psw_idle() will enable interrupts
> to wait for an interrupt. As soon as an interrupt occurs the interrupt
> handler will verify if the old context was psw_idle(). If that is the
> case the interrupt enablement bits in the old program status word will
> be cleared.
> 
> A subsequent test in both the external as well as the io interrupt
> handler checks if in the old context interrupts were enabled. Due to
> the above patching of the old program status word it is assumed the
> old context had interrupts disabled, and therefore a call to
> TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in
> turn makes lockdep incorrectly "think" that interrupts are enabled
> within the interrupt handler.
> 
> Fix this by unconditionally calling TRACE_IRQS_OFF when entering
> interrupt handlers. Also call unconditionally TRACE_IRQS_ON when
> leaving interrupts handlers.
> 
> This leaves the special psw_idle() case, which now returns with
> interrupts disabled, but has an "irqs on" lockdep state. So callers of
> psw_idle() must adjust the state on their own, if required. This is
> currently only __udelay_disabled().
> 
> Fixes: 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs tracing")
> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

FWIW, this makes sense to me from what I had to chase on the arm64 side,
and this seems happy atop v5.10-rc6 with all the lockdep and RCU debug
options enabled when booting to userspace under QEMU.

Thanks,
Mark.

> ---
>  arch/s390/kernel/entry.S | 15 ---------------
>  arch/s390/lib/delay.c    |  5 ++---
>  2 files changed, 2 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S
> index 26bb0603c5a1..92beb1444644 100644
> --- a/arch/s390/kernel/entry.S
> +++ b/arch/s390/kernel/entry.S
> @@ -763,12 +763,7 @@ ENTRY(io_int_handler)
>  	xc	__PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
>  	TSTMSK	__LC_CPU_FLAGS,_CIF_IGNORE_IRQ
>  	jo	.Lio_restore
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> -	tmhh	%r8,0x300
> -	jz	1f
>  	TRACE_IRQS_OFF
> -1:
> -#endif
>  	xc	__SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
>  .Lio_loop:
>  	lgr	%r2,%r11		# pass pointer to pt_regs
> @@ -791,12 +786,7 @@ ENTRY(io_int_handler)
>  	TSTMSK	__LC_CPU_FLAGS,_CIF_WORK
>  	jnz	.Lio_work
>  .Lio_restore:
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> -	tm	__PT_PSW(%r11),3
> -	jno	0f
>  	TRACE_IRQS_ON
> -0:
> -#endif
>  	mvc	__LC_RETURN_PSW(16),__PT_PSW(%r11)
>  	tm	__PT_PSW+1(%r11),0x01	# returning to user ?
>  	jno	.Lio_exit_kernel
> @@ -976,12 +966,7 @@ ENTRY(ext_int_handler)
>  	xc	__PT_FLAGS(8,%r11),__PT_FLAGS(%r11)
>  	TSTMSK	__LC_CPU_FLAGS,_CIF_IGNORE_IRQ
>  	jo	.Lio_restore
> -#if IS_ENABLED(CONFIG_TRACE_IRQFLAGS)
> -	tmhh	%r8,0x300
> -	jz	1f
>  	TRACE_IRQS_OFF
> -1:
> -#endif
>  	xc	__SF_BACKCHAIN(8,%r15),__SF_BACKCHAIN(%r15)
>  	lgr	%r2,%r11		# pass pointer to pt_regs
>  	lghi	%r3,EXT_INTERRUPT
> diff --git a/arch/s390/lib/delay.c b/arch/s390/lib/delay.c
> index daca7bad66de..8c0c68e7770e 100644
> --- a/arch/s390/lib/delay.c
> +++ b/arch/s390/lib/delay.c
> @@ -33,7 +33,7 @@ EXPORT_SYMBOL(__delay);
>  
>  static void __udelay_disabled(unsigned long long usecs)
>  {
> -	unsigned long cr0, cr0_new, psw_mask, flags;
> +	unsigned long cr0, cr0_new, psw_mask;
>  	struct s390_idle_data idle;
>  	u64 end;
>  
> @@ -45,9 +45,8 @@ static void __udelay_disabled(unsigned long long usecs)
>  	psw_mask = __extract_psw() | PSW_MASK_EXT | PSW_MASK_WAIT;
>  	set_clock_comparator(end);
>  	set_cpu_flag(CIF_IGNORE_IRQ);
> -	local_irq_save(flags);
>  	psw_idle(&idle, psw_mask);
> -	local_irq_restore(flags);
> +	trace_hardirqs_off();
>  	clear_cpu_flag(CIF_IGNORE_IRQ);
>  	set_clock_comparator(S390_lowcore.clock_comparator);
>  	__ctl_load(cr0, 0, 0);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-02 10:56                                     ` Heiko Carstens
  2020-12-02 11:16                                       ` Mark Rutland
@ 2020-12-02 12:27                                       ` Peter Zijlstra
  1 sibling, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2020-12-02 12:27 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Mark Rutland, Paul E. McKenney, Christian Borntraeger,
	Linus Torvalds, Sven Schnelle, Thomas Gleixner,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Wed, Dec 02, 2020 at 11:56:49AM +0100, Heiko Carstens wrote:
> From 7bd86fb3eb039a4163281472ca79b9158e726526 Mon Sep 17 00:00:00 2001
> From: Heiko Carstens <hca@linux.ibm.com>
> Date: Wed, 2 Dec 2020 11:46:01 +0100
> Subject: [PATCH] s390: fix irq state tracing
> 
> With commit 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs
> tracing") common code calls arch_cpu_idle() with a lockdep state that
> tells irqs are on.
> 
> This doesn't work very well for s390: psw_idle() will enable interrupts
> to wait for an interrupt. As soon as an interrupt occurs the interrupt
> handler will verify if the old context was psw_idle(). If that is the
> case the interrupt enablement bits in the old program status word will
> be cleared.
> 
> A subsequent test in both the external as well as the io interrupt
> handler checks if in the old context interrupts were enabled. Due to
> the above patching of the old program status word it is assumed the
> old context had interrupts disabled, and therefore a call to
> TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in
> turn makes lockdep incorrectly "think" that interrupts are enabled
> within the interrupt handler.
> 
> Fix this by unconditionally calling TRACE_IRQS_OFF when entering
> interrupt handlers. Also call unconditionally TRACE_IRQS_ON when
> leaving interrupts handlers.
> 
> This leaves the special psw_idle() case, which now returns with
> interrupts disabled, but has an "irqs on" lockdep state. So callers of
> psw_idle() must adjust the state on their own, if required. This is
> currently only __udelay_disabled().
> 
> Fixes: 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs tracing")
> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

Thanks for sorting this Heiko!

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [GIT pull] locking/urgent for v5.10-rc6
  2020-12-02 11:16                                       ` Mark Rutland
@ 2020-12-02 13:46                                         ` Heiko Carstens
  0 siblings, 0 replies; 38+ messages in thread
From: Heiko Carstens @ 2020-12-02 13:46 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Peter Zijlstra, Paul E. McKenney, Christian Borntraeger,
	Linus Torvalds, Sven Schnelle, Thomas Gleixner,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Wed, Dec 02, 2020 at 11:16:05AM +0000, Mark Rutland wrote:
> On Wed, Dec 02, 2020 at 11:56:49AM +0100, Heiko Carstens wrote:
> > From 7bd86fb3eb039a4163281472ca79b9158e726526 Mon Sep 17 00:00:00 2001
> > From: Heiko Carstens <hca@linux.ibm.com>
> > Date: Wed, 2 Dec 2020 11:46:01 +0100
> > Subject: [PATCH] s390: fix irq state tracing
> > 
> > With commit 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs
> > tracing") common code calls arch_cpu_idle() with a lockdep state that
> > tells irqs are on.
> > 
> > This doesn't work very well for s390: psw_idle() will enable interrupts
> > to wait for an interrupt. As soon as an interrupt occurs the interrupt
> > handler will verify if the old context was psw_idle(). If that is the
> > case the interrupt enablement bits in the old program status word will
> > be cleared.
> > 
> > A subsequent test in both the external as well as the io interrupt
> > handler checks if in the old context interrupts were enabled. Due to
> > the above patching of the old program status word it is assumed the
> > old context had interrupts disabled, and therefore a call to
> > TRACE_IRQS_OFF (aka trace_hardirqs_off_caller) is skipped. Which in
> > turn makes lockdep incorrectly "think" that interrupts are enabled
> > within the interrupt handler.
> > 
> > Fix this by unconditionally calling TRACE_IRQS_OFF when entering
> > interrupt handlers. Also call unconditionally TRACE_IRQS_ON when
> > leaving interrupts handlers.
> > 
> > This leaves the special psw_idle() case, which now returns with
> > interrupts disabled, but has an "irqs on" lockdep state. So callers of
> > psw_idle() must adjust the state on their own, if required. This is
> > currently only __udelay_disabled().
> > 
> > Fixes: 58c644ba512c ("sched/idle: Fix arch_cpu_idle() vs tracing")
> > Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
> 
> FWIW, this makes sense to me from what I had to chase on the arm64 side,
> and this seems happy atop v5.10-rc6 with all the lockdep and RCU debug
> options enabled when booting to userspace under QEMU.
> 
> Thanks,
> Mark.

Thanks a lot for having a look and testing this!

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2020-12-02 13:47 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-29 13:37 [GIT pull] irq/urgent for v5.10-rc6 Thomas Gleixner
2020-11-29 13:38 ` [GIT pull] locking/urgent " Thomas Gleixner
2020-11-29 19:31   ` Linus Torvalds
2020-11-30  0:29     ` Paul E. McKenney
2020-11-30  7:56     ` Peter Zijlstra
2020-11-30  8:23       ` Sven Schnelle
2020-11-30 12:31         ` Sven Schnelle
2020-11-30 12:52           ` Peter Zijlstra
2020-11-30 13:03             ` Peter Zijlstra
2020-11-30 18:04               ` Linus Torvalds
2020-11-30 19:31                 ` Christian Borntraeger
2020-12-01  8:07                   ` Peter Zijlstra
2020-12-01 11:07                     ` Peter Zijlstra
2020-12-01 14:46                       ` Paul E. McKenney
2020-12-01 14:55                         ` Peter Zijlstra
2020-12-01 16:53                           ` Paul E. McKenney
2020-12-01 18:15                           ` Peter Zijlstra
2020-12-01 18:48                             ` Peter Zijlstra
2020-12-01 18:57                             ` Mark Rutland
2020-12-01 19:14                               ` Peter Zijlstra
2020-12-01 19:18                                 ` Heiko Carstens
2020-12-02  9:21                                   ` Peter Zijlstra
2020-12-02 10:56                                     ` Heiko Carstens
2020-12-02 11:16                                       ` Mark Rutland
2020-12-02 13:46                                         ` Heiko Carstens
2020-12-02 12:27                                       ` Peter Zijlstra
2020-12-01  7:48               ` Sven Schnelle
2020-12-02  7:54               ` Heiko Carstens
2020-12-02  9:38                 ` Peter Zijlstra
2020-12-02 10:52                   ` Heiko Carstens
2020-11-30  8:36       ` Peter Zijlstra
2020-11-30 17:55       ` Linus Torvalds
2020-12-01  7:39         ` Peter Zijlstra
2020-12-01  7:56           ` Peter Zijlstra
2020-12-01 19:45             ` Linus Torvalds
2020-12-02  7:53               ` Peter Zijlstra
2020-11-29 20:06   ` pr-tracker-bot
2020-11-29 20:06 ` [GIT pull] irq/urgent " pr-tracker-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).