All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/9] Rework cpuidle vs instrumentation
@ 2022-05-19 21:27 Peter Zijlstra
  2022-05-19 21:27 ` [RFC][PATCH 1/9] x86/perf/amd: Remove tracing from perf_lopwr_cb() Peter Zijlstra
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Peter Zijlstra @ 2022-05-19 21:27 UTC (permalink / raw)
  To: frederic, paulmck, rjw, x86; +Cc: linux-kernel, peterz, jpoimboe

Hi,

Most of these patches (with some obvious exceptions) stem from the period when
we did the entry rewrite. I might even have posted some of them, but they
landed on the floor and went unloved for a while.

Luckily, Frederic added a comment in his context tracking rework that made me
remember these.

They build and boot x86_64. Be sure to build with DEBUG_ENTRY enabled to find
noinstr violations -- I fixed all I found, but I didn't do allyesconfig builds.

Please have a look.

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/idle

---
 arch/alpha/kernel/process.c       |  1 -
 arch/arc/kernel/process.c         |  3 ++
 arch/arm/kernel/process.c         |  1 -
 arch/arm/mach-gemini/board-dt.c   |  3 +-
 arch/arm/mach-imx/cpuidle-imx6q.c |  5 +--
 arch/arm64/kernel/idle.c          |  1 -
 arch/csky/kernel/process.c        |  1 -
 arch/csky/kernel/smp.c            |  2 +-
 arch/h8300/kernel/process.c       |  1 +
 arch/hexagon/kernel/process.c     |  1 -
 arch/ia64/kernel/process.c        |  1 +
 arch/microblaze/kernel/process.c  |  1 -
 arch/mips/kernel/idle.c           |  8 ++---
 arch/nios2/kernel/process.c       |  1 -
 arch/openrisc/kernel/process.c    |  1 +
 arch/parisc/kernel/process.c      |  2 --
 arch/powerpc/kernel/idle.c        |  5 ++-
 arch/riscv/kernel/process.c       |  1 -
 arch/s390/kernel/idle.c           |  1 -
 arch/sh/kernel/idle.c             |  1 +
 arch/sparc/kernel/leon_pmc.c      |  4 +++
 arch/sparc/kernel/process_32.c    |  1 -
 arch/sparc/kernel/process_64.c    |  3 +-
 arch/um/kernel/process.c          |  1 -
 arch/x86/coco/tdx/tdx.c           |  3 ++
 arch/x86/events/amd/brs.c         | 13 +++-----
 arch/x86/include/asm/irqflags.h   | 11 +++----
 arch/x86/include/asm/mwait.h      |  2 +-
 arch/x86/kernel/process.c         | 64 +++++++++++++++++++--------------------
 arch/xtensa/kernel/process.c      |  1 +
 drivers/acpi/processor_idle.c     | 42 ++++++++++++++-----------
 drivers/cpuidle/cpuidle.c         | 21 +++++++------
 drivers/cpuidle/poll_state.c      |  6 +++-
 include/linux/compiler_types.h    |  8 +++--
 include/linux/cpu.h               |  3 --
 include/linux/sched/idle.h        | 29 ++++++++++++++++++
 kernel/rcu/tree.c                 |  9 ++----
 kernel/sched/idle.c               | 47 ++++++++--------------------
 kernel/time/tick-broadcast.c      |  2 +-
 tools/objtool/check.c             | 15 ++++++++-
 40 files changed, 176 insertions(+), 150 deletions(-)



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][PATCH 1/9] x86/perf/amd: Remove tracing from perf_lopwr_cb()
  2022-05-19 21:27 [RFC][PATCH 0/9] Rework cpuidle vs instrumentation Peter Zijlstra
@ 2022-05-19 21:27 ` Peter Zijlstra
  2022-05-19 21:27 ` [RFC][PATCH 2/9] x86/idle: Replace x86_idle with a static_call Peter Zijlstra
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2022-05-19 21:27 UTC (permalink / raw)
  To: frederic, paulmck, rjw, x86, eranian; +Cc: linux-kernel, peterz, jpoimboe

The perf_lopwr_cb() is called from the idle routines; there is no RCU
there, we must not enter tracing.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/events/amd/brs.c |   13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

--- a/arch/x86/events/amd/brs.c
+++ b/arch/x86/events/amd/brs.c
@@ -41,18 +41,15 @@ static inline unsigned int brs_to(int id
 	return MSR_AMD_SAMP_BR_FROM + 2 * idx + 1;
 }
 
-static inline void set_debug_extn_cfg(u64 val)
+static __always_inline void set_debug_extn_cfg(u64 val)
 {
 	/* bits[4:3] must always be set to 11b */
-	wrmsrl(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3);
+	__wrmsr(MSR_AMD_DBG_EXTN_CFG, val | 3ULL << 3, val >> 32);
 }
 
-static inline u64 get_debug_extn_cfg(void)
+static __always_inline u64 get_debug_extn_cfg(void)
 {
-	u64 val;
-
-	rdmsrl(MSR_AMD_DBG_EXTN_CFG, val);
-	return val;
+	return __rdmsr(MSR_AMD_DBG_EXTN_CFG);
 }
 
 static bool __init amd_brs_detect(void)
@@ -338,7 +335,7 @@ void amd_pmu_brs_sched_task(struct perf_
  * called from ACPI processor_idle.c or acpi_pad.c
  * with interrupts disabled
  */
-void perf_amd_brs_lopwr_cb(bool lopwr_in)
+void noinstr perf_amd_brs_lopwr_cb(bool lopwr_in)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	union amd_debug_extn_cfg cfg;



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][PATCH 2/9] x86/idle: Replace x86_idle with a static_call
  2022-05-19 21:27 [RFC][PATCH 0/9] Rework cpuidle vs instrumentation Peter Zijlstra
  2022-05-19 21:27 ` [RFC][PATCH 1/9] x86/perf/amd: Remove tracing from perf_lopwr_cb() Peter Zijlstra
@ 2022-05-19 21:27 ` Peter Zijlstra
  2022-05-30 11:07   ` Frederic Weisbecker
  2022-05-19 21:27 ` [RFC][PATCH 3/9] cpuidle: Move IRQ state validation Peter Zijlstra
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2022-05-19 21:27 UTC (permalink / raw)
  To: frederic, paulmck, rjw, x86; +Cc: linux-kernel, peterz, jpoimboe

Typical boot time setup; no need to suffer an indirect call for that.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/process.c |   50 +++++++++++++++++++++++++---------------------
 1 file changed, 28 insertions(+), 22 deletions(-)

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -24,6 +24,7 @@
 #include <linux/cpuidle.h>
 #include <linux/acpi.h>
 #include <linux/elf-randomize.h>
+#include <linux/static_call.h>
 #include <trace/events/power.h>
 #include <linux/hw_breakpoint.h>
 #include <asm/cpu.h>
@@ -692,7 +693,23 @@ void __switch_to_xtra(struct task_struct
 unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
 EXPORT_SYMBOL(boot_option_idle_override);
 
-static void (*x86_idle)(void);
+/*
+ * We use this if we don't have any better idle routine..
+ */
+void __cpuidle default_idle(void)
+{
+	raw_safe_halt();
+}
+#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
+EXPORT_SYMBOL(default_idle);
+#endif
+
+DEFINE_STATIC_CALL_NULL(x86_idle, default_idle);
+
+static bool x86_idle_set(void)
+{
+	return !!static_call_query(x86_idle);
+}
 
 #ifndef CONFIG_SMP
 static inline void play_dead(void)
@@ -715,28 +732,17 @@ void arch_cpu_idle_dead(void)
 /*
  * Called from the generic idle code.
  */
-void arch_cpu_idle(void)
-{
-	x86_idle();
-}
-
-/*
- * We use this if we don't have any better idle routine..
- */
-void __cpuidle default_idle(void)
+void __cpuidle arch_cpu_idle(void)
 {
-	raw_safe_halt();
+	static_call(x86_idle)();
 }
-#if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
-EXPORT_SYMBOL(default_idle);
-#endif
 
 #ifdef CONFIG_XEN
 bool xen_set_default_idle(void)
 {
-	bool ret = !!x86_idle;
+	bool ret = x86_idle_set();
 
-	x86_idle = default_idle;
+	static_call_update(x86_idle, default_idle);
 
 	return ret;
 }
@@ -859,20 +865,20 @@ void select_idle_routine(const struct cp
 	if (boot_option_idle_override == IDLE_POLL && smp_num_siblings > 1)
 		pr_warn_once("WARNING: polling idle and HT enabled, performance may degrade\n");
 #endif
-	if (x86_idle || boot_option_idle_override == IDLE_POLL)
+	if (x86_idle_set() || boot_option_idle_override == IDLE_POLL)
 		return;
 
 	if (boot_cpu_has_bug(X86_BUG_AMD_E400)) {
 		pr_info("using AMD E400 aware idle routine\n");
-		x86_idle = amd_e400_idle;
+		static_call_update(x86_idle, amd_e400_idle);
 	} else if (prefer_mwait_c1_over_halt(c)) {
 		pr_info("using mwait in idle threads\n");
-		x86_idle = mwait_idle;
+		static_call_update(x86_idle, mwait_idle);
 	} else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
 		pr_info("using TDX aware idle routine\n");
-		x86_idle = tdx_safe_halt;
+		static_call_update(x86_idle, tdx_safe_halt);
 	} else
-		x86_idle = default_idle;
+		static_call_update(x86_idle, default_idle);
 }
 
 void amd_e400_c1e_apic_setup(void)
@@ -925,7 +931,7 @@ static int __init idle_setup(char *str)
 		 * To continue to load the CPU idle driver, don't touch
 		 * the boot_option_idle_override.
 		 */
-		x86_idle = default_idle;
+		static_call_update(x86_idle, default_idle);
 		boot_option_idle_override = IDLE_HALT;
 	} else if (!strcmp(str, "nomwait")) {
 		/*



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][PATCH 3/9] cpuidle: Move IRQ state validation
  2022-05-19 21:27 [RFC][PATCH 0/9] Rework cpuidle vs instrumentation Peter Zijlstra
  2022-05-19 21:27 ` [RFC][PATCH 1/9] x86/perf/amd: Remove tracing from perf_lopwr_cb() Peter Zijlstra
  2022-05-19 21:27 ` [RFC][PATCH 2/9] x86/idle: Replace x86_idle with a static_call Peter Zijlstra
@ 2022-05-19 21:27 ` Peter Zijlstra
  2022-05-30 11:36   ` Frederic Weisbecker
  2022-05-19 21:27 ` [RFC][PATCH 4/9] idle: Fix rcu_idle_*() usage Peter Zijlstra
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2022-05-19 21:27 UTC (permalink / raw)
  To: frederic, paulmck, rjw, x86; +Cc: linux-kernel, peterz, jpoimboe

Make cpuidle_enter_state() consistent with the s2idle variant and
verify ->enter() always returns with interrupts disabled.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/cpuidle/cpuidle.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -234,7 +234,11 @@ int cpuidle_enter_state(struct cpuidle_d
 	stop_critical_timings();
 	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
 		rcu_idle_enter();
+
 	entered_state = target_state->enter(dev, drv, index);
+	if (WARN_ON_ONCE(!irqs_disabled()))
+		raw_local_irq_disable();
+
 	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
 		rcu_idle_exit();
 	start_critical_timings();
@@ -246,12 +250,8 @@ int cpuidle_enter_state(struct cpuidle_d
 	/* The cpu is no longer idle or about to enter idle. */
 	sched_idle_set_state(NULL);
 
-	if (broadcast) {
-		if (WARN_ON_ONCE(!irqs_disabled()))
-			local_irq_disable();
-
+	if (broadcast)
 		tick_broadcast_exit();
-	}
 
 	if (!cpuidle_state_is_coupled(drv, index))
 		local_irq_enable();



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][PATCH 4/9] idle: Fix rcu_idle_*() usage
  2022-05-19 21:27 [RFC][PATCH 0/9] Rework cpuidle vs instrumentation Peter Zijlstra
                   ` (2 preceding siblings ...)
  2022-05-19 21:27 ` [RFC][PATCH 3/9] cpuidle: Move IRQ state validation Peter Zijlstra
@ 2022-05-19 21:27 ` Peter Zijlstra
  2022-05-20 23:29   ` Hillf Danton
  2022-05-23  9:46   ` Gautham R. Shenoy
  2022-05-19 21:27 ` [RFC][PATCH 5/9] rcu: Fix rcu_idle_exit() Peter Zijlstra
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2022-05-19 21:27 UTC (permalink / raw)
  To: frederic, paulmck, rjw, x86; +Cc: linux-kernel, peterz, jpoimboe

The whole disable-RCU, enable-IRQS dance is very intricate since
changing IRQ state is traced, which depends on RCU.

Add two helpers for the cpuidle case that mirror the entry code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm/mach-imx/cpuidle-imx6q.c |    5 ++--
 drivers/acpi/processor_idle.c     |   18 +++++++++------
 drivers/cpuidle/cpuidle.c         |   11 +++++----
 include/linux/sched/idle.h        |   29 ++++++++++++++++++++++++
 kernel/sched/idle.c               |   45 ++++++++++----------------------------
 kernel/time/tick-broadcast.c      |    2 -
 6 files changed, 62 insertions(+), 48 deletions(-)

--- a/arch/arm/mach-imx/cpuidle-imx6q.c
+++ b/arch/arm/mach-imx/cpuidle-imx6q.c
@@ -5,6 +5,7 @@
 
 #include <linux/cpuidle.h>
 #include <linux/module.h>
+#include <linux/sched/idle.h>
 #include <asm/cpuidle.h>
 
 #include <soc/imx/cpuidle.h>
@@ -24,9 +25,9 @@ static int imx6q_enter_wait(struct cpuid
 		imx6_set_lpm(WAIT_UNCLOCKED);
 	raw_spin_unlock(&cpuidle_lock);
 
-	rcu_idle_enter();
+	cpuidle_rcu_enter();
 	cpu_do_idle();
-	rcu_idle_exit();
+	cpuidle_rcu_exit();
 
 	raw_spin_lock(&cpuidle_lock);
 	if (num_idle_cpus-- == num_online_cpus())
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -607,7 +607,7 @@ static DEFINE_RAW_SPINLOCK(c3_lock);
  * @cx: Target state context
  * @index: index of target state
  */
-static int acpi_idle_enter_bm(struct cpuidle_driver *drv,
+static noinstr int acpi_idle_enter_bm(struct cpuidle_driver *drv,
 			       struct acpi_processor *pr,
 			       struct acpi_processor_cx *cx,
 			       int index)
@@ -626,6 +626,8 @@ static int acpi_idle_enter_bm(struct cpu
 	 */
 	bool dis_bm = pr->flags.bm_control;
 
+	instrumentation_begin();
+
 	/* If we can skip BM, demote to a safe state. */
 	if (!cx->bm_sts_skip && acpi_idle_bm_check()) {
 		dis_bm = false;
@@ -647,11 +649,11 @@ static int acpi_idle_enter_bm(struct cpu
 		raw_spin_unlock(&c3_lock);
 	}
 
-	rcu_idle_enter();
+	cpuidle_rcu_enter();
 
 	acpi_idle_do_entry(cx);
 
-	rcu_idle_exit();
+	cpuidle_rcu_exit();
 
 	/* Re-enable bus master arbitration */
 	if (dis_bm) {
@@ -661,11 +663,13 @@ static int acpi_idle_enter_bm(struct cpu
 		raw_spin_unlock(&c3_lock);
 	}
 
+	instrumentation_end();
+
 	return index;
 }
 
-static int acpi_idle_enter(struct cpuidle_device *dev,
-			   struct cpuidle_driver *drv, int index)
+static noinstr int acpi_idle_enter(struct cpuidle_device *dev,
+				   struct cpuidle_driver *drv, int index)
 {
 	struct acpi_processor_cx *cx = per_cpu(acpi_cstate[index], dev->cpu);
 	struct acpi_processor *pr;
@@ -693,8 +697,8 @@ static int acpi_idle_enter(struct cpuidl
 	return index;
 }
 
-static int acpi_idle_enter_s2idle(struct cpuidle_device *dev,
-				  struct cpuidle_driver *drv, int index)
+static noinstr int acpi_idle_enter_s2idle(struct cpuidle_device *dev,
+					  struct cpuidle_driver *drv, int index)
 {
 	struct acpi_processor_cx *cx = per_cpu(acpi_cstate[index], dev->cpu);
 
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -13,6 +13,7 @@
 #include <linux/mutex.h>
 #include <linux/sched.h>
 #include <linux/sched/clock.h>
+#include <linux/sched/idle.h>
 #include <linux/notifier.h>
 #include <linux/pm_qos.h>
 #include <linux/cpu.h>
@@ -150,12 +151,12 @@ static void enter_s2idle_proper(struct c
 	 */
 	stop_critical_timings();
 	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
-		rcu_idle_enter();
+		cpuidle_rcu_enter();
 	target_state->enter_s2idle(dev, drv, index);
 	if (WARN_ON_ONCE(!irqs_disabled()))
-		local_irq_disable();
+		raw_local_irq_disable();
 	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
-		rcu_idle_exit();
+		cpuidle_rcu_enter();
 	tick_unfreeze();
 	start_critical_timings();
 
@@ -233,14 +234,14 @@ int cpuidle_enter_state(struct cpuidle_d
 
 	stop_critical_timings();
 	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
-		rcu_idle_enter();
+		cpuidle_rcu_enter();
 
 	entered_state = target_state->enter(dev, drv, index);
 	if (WARN_ON_ONCE(!irqs_disabled()))
 		raw_local_irq_disable();
 
 	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
-		rcu_idle_exit();
+		cpuidle_rcu_exit();
 	start_critical_timings();
 
 	sched_clock_idle_wakeup_event();
--- a/include/linux/sched/idle.h
+++ b/include/linux/sched/idle.h
@@ -88,4 +88,33 @@ static inline void current_clr_polling(v
 	preempt_fold_need_resched();
 }
 
+static __always_inline void cpuidle_rcu_enter(void)
+{
+	lockdep_assert_irqs_disabled();
+	/*
+	 * Idle is allowed to (temporary) enable IRQs. It
+	 * will return with IRQs disabled.
+	 *
+	 * Trace IRQs enable here, then switch off RCU, and have
+	 * arch_cpu_idle() use raw_local_irq_enable(). Note that
+	 * rcu_idle_enter() relies on lockdep IRQ state, so switch that
+	 * last -- this is very similar to the entry code.
+	 */
+	trace_hardirqs_on_prepare();
+	lockdep_hardirqs_on_prepare();
+	instrumentation_end();
+	rcu_idle_enter();
+	lockdep_hardirqs_on(_THIS_IP_);
+}
+
+static __always_inline void cpuidle_rcu_exit(void)
+{
+	/*
+	 * Carefully undo the above.
+	 */
+	lockdep_hardirqs_off(_THIS_IP_);
+	rcu_idle_exit();
+	instrumentation_begin();
+}
+
 #endif /* _LINUX_SCHED_IDLE_H */
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -51,18 +51,22 @@ __setup("hlt", cpu_idle_nopoll_setup);
 
 static noinline int __cpuidle cpu_idle_poll(void)
 {
+	instrumentation_begin();
 	trace_cpu_idle(0, smp_processor_id());
 	stop_critical_timings();
-	rcu_idle_enter();
-	local_irq_enable();
+	cpuidle_rcu_enter();
 
+	raw_local_irq_enable();
 	while (!tif_need_resched() &&
 	       (cpu_idle_force_poll || tick_check_broadcast_expired()))
 		cpu_relax();
+	raw_local_irq_disable();
 
-	rcu_idle_exit();
+	cpuidle_rcu_exit();
 	start_critical_timings();
 	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
+	local_irq_enable();
+	instrumentation_end();
 
 	return 1;
 }
@@ -85,44 +87,21 @@ void __weak arch_cpu_idle(void)
  */
 void __cpuidle default_idle_call(void)
 {
-	if (current_clr_polling_and_test()) {
-		local_irq_enable();
-	} else {
-
+	instrumentation_begin();
+	if (!current_clr_polling_and_test()) {
 		trace_cpu_idle(1, smp_processor_id());
 		stop_critical_timings();
 
-		/*
-		 * arch_cpu_idle() is supposed to enable IRQs, however
-		 * we can't do that because of RCU and tracing.
-		 *
-		 * Trace IRQs enable here, then switch off RCU, and have
-		 * arch_cpu_idle() use raw_local_irq_enable(). Note that
-		 * rcu_idle_enter() relies on lockdep IRQ state, so switch that
-		 * last -- this is very similar to the entry code.
-		 */
-		trace_hardirqs_on_prepare();
-		lockdep_hardirqs_on_prepare();
-		rcu_idle_enter();
-		lockdep_hardirqs_on(_THIS_IP_);
-
+		cpuidle_rcu_enter();
 		arch_cpu_idle();
-
-		/*
-		 * OK, so IRQs are enabled here, but RCU needs them disabled to
-		 * turn itself back on.. funny thing is that disabling IRQs
-		 * will cause tracing, which needs RCU. Jump through hoops to
-		 * make it 'work'.
-		 */
 		raw_local_irq_disable();
-		lockdep_hardirqs_off(_THIS_IP_);
-		rcu_idle_exit();
-		lockdep_hardirqs_on(_THIS_IP_);
-		raw_local_irq_enable();
+		cpuidle_rcu_exit();
 
 		start_critical_timings();
 		trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 	}
+	local_irq_enable();
+	instrumentation_end();
 }
 
 static int call_cpuidle_s2idle(struct cpuidle_driver *drv,
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -622,7 +622,7 @@ struct cpumask *tick_get_broadcast_onesh
  * to avoid a deep idle transition as we are about to get the
  * broadcast IPI right away.
  */
-int tick_check_broadcast_expired(void)
+int noinstr tick_check_broadcast_expired(void)
 {
 	return cpumask_test_cpu(smp_processor_id(), tick_broadcast_force_mask);
 }



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][PATCH 5/9] rcu: Fix rcu_idle_exit()
  2022-05-19 21:27 [RFC][PATCH 0/9] Rework cpuidle vs instrumentation Peter Zijlstra
                   ` (3 preceding siblings ...)
  2022-05-19 21:27 ` [RFC][PATCH 4/9] idle: Fix rcu_idle_*() usage Peter Zijlstra
@ 2022-05-19 21:27 ` Peter Zijlstra
  2022-05-20 21:00   ` Paul E. McKenney
  2022-05-19 21:27 ` [RFC][PATCH 6/9] acpi_idle: Remove tracing Peter Zijlstra
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2022-05-19 21:27 UTC (permalink / raw)
  To: frederic, paulmck, rjw, x86; +Cc: linux-kernel, peterz, jpoimboe

Current rcu_idle_exit() is terminally broken because it uses
local_irq_{save,restore}(), which are traced which uses RCU.

However, now that all the callers are sure to have IRQs disabled, we
can remove these calls.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/rcu/tree.c |    9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -659,7 +659,7 @@ static noinstr void rcu_eqs_enter(bool u
  * If you add or remove a call to rcu_idle_enter(), be sure to test with
  * CONFIG_RCU_EQS_DEBUG=y.
  */
-void rcu_idle_enter(void)
+void noinstr rcu_idle_enter(void)
 {
 	lockdep_assert_irqs_disabled();
 	rcu_eqs_enter(false);
@@ -896,13 +896,10 @@ static void noinstr rcu_eqs_exit(bool us
  * If you add or remove a call to rcu_idle_exit(), be sure to test with
  * CONFIG_RCU_EQS_DEBUG=y.
  */
-void rcu_idle_exit(void)
+void noinstr rcu_idle_exit(void)
 {
-	unsigned long flags;
-
-	local_irq_save(flags);
+	lockdep_assert_irqs_disabled();
 	rcu_eqs_exit(false);
-	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(rcu_idle_exit);
 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][PATCH 6/9] acpi_idle: Remove tracing
  2022-05-19 21:27 [RFC][PATCH 0/9] Rework cpuidle vs instrumentation Peter Zijlstra
                   ` (4 preceding siblings ...)
  2022-05-19 21:27 ` [RFC][PATCH 5/9] rcu: Fix rcu_idle_exit() Peter Zijlstra
@ 2022-05-19 21:27 ` Peter Zijlstra
  2022-05-19 21:27 ` [RFC][PATCH 7/9] cpuidle: Annotate poll_idle() Peter Zijlstra
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2022-05-19 21:27 UTC (permalink / raw)
  To: frederic, paulmck, rjw, x86; +Cc: linux-kernel, peterz, jpoimboe

All the idle routines are called with RCU disabled, as such there must
not be any tracing inside.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/acpi/processor_idle.c |   24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -108,8 +108,8 @@ static const struct dmi_system_id proces
 static void __cpuidle acpi_safe_halt(void)
 {
 	if (!tif_need_resched()) {
-		safe_halt();
-		local_irq_disable();
+		raw_safe_halt();
+		raw_local_irq_disable();
 	}
 }
 
@@ -524,16 +524,21 @@ static int acpi_idle_bm_check(void)
 	return bm_status;
 }
 
-static void wait_for_freeze(void)
+static __cpuidle void io_idle(unsigned long addr)
 {
+	/* IO port based C-state */
+	inb(addr);
+
 #ifdef	CONFIG_X86
 	/* No delay is needed if we are in guest */
 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
 		return;
 #endif
-	/* Dummy wait op - must do something useless after P_LVL2 read
-	   because chipsets cannot guarantee that STPCLK# signal
-	   gets asserted in time to freeze execution properly. */
+	/*
+	 * Dummy wait op - must do something useless after P_LVL2 read
+	 * because chipsets cannot guarantee that STPCLK# signal
+	 * gets asserted in time to freeze execution properly.
+	 */
 	inl(acpi_gbl_FADT.xpm_timer_block.address);
 }
 
@@ -553,9 +558,7 @@ static void __cpuidle acpi_idle_do_entry
 	} else if (cx->entry_method == ACPI_CSTATE_HALT) {
 		acpi_safe_halt();
 	} else {
-		/* IO port based C-state */
-		inb(cx->address);
-		wait_for_freeze();
+		io_idle(cx->address);
 	}
 
 	perf_lopwr_cb(false);
@@ -577,8 +580,7 @@ static int acpi_idle_play_dead(struct cp
 		if (cx->entry_method == ACPI_CSTATE_HALT)
 			safe_halt();
 		else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) {
-			inb(cx->address);
-			wait_for_freeze();
+			io_idle(cx->address);
 		} else
 			return -ENODEV;
 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][PATCH 7/9] cpuidle: Annotate poll_idle()
  2022-05-19 21:27 [RFC][PATCH 0/9] Rework cpuidle vs instrumentation Peter Zijlstra
                   ` (5 preceding siblings ...)
  2022-05-19 21:27 ` [RFC][PATCH 6/9] acpi_idle: Remove tracing Peter Zijlstra
@ 2022-05-19 21:27 ` Peter Zijlstra
  2022-05-19 21:27 ` [RFC][PATCH 8/9] objtool/idle: Validate __cpuidle code as noinstr Peter Zijlstra
  2022-05-19 21:27 ` [RFC][PATCH 9/9] arch/idle: Change arch_cpu_idle() IRQ behaviour Peter Zijlstra
  8 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2022-05-19 21:27 UTC (permalink / raw)
  To: frederic, paulmck, rjw, x86; +Cc: linux-kernel, peterz, jpoimboe

The __cpuidle functions will become a noinstr class, as such they need
explicit annotations.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/cpuidle/poll_state.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -13,7 +13,10 @@
 static int __cpuidle poll_idle(struct cpuidle_device *dev,
 			       struct cpuidle_driver *drv, int index)
 {
-	u64 time_start = local_clock();
+	u64 time_start;
+
+	instrumentation_begin();
+	time_start = local_clock();
 
 	dev->poll_time_limit = false;
 
@@ -37,6 +40,7 @@ static int __cpuidle poll_idle(struct cp
 		}
 	}
 	current_clr_polling();
+	instrumentation_end();
 
 	return index;
 }



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][PATCH 8/9] objtool/idle: Validate __cpuidle code as noinstr
  2022-05-19 21:27 [RFC][PATCH 0/9] Rework cpuidle vs instrumentation Peter Zijlstra
                   ` (6 preceding siblings ...)
  2022-05-19 21:27 ` [RFC][PATCH 7/9] cpuidle: Annotate poll_idle() Peter Zijlstra
@ 2022-05-19 21:27 ` Peter Zijlstra
  2022-05-19 21:27 ` [RFC][PATCH 9/9] arch/idle: Change arch_cpu_idle() IRQ behaviour Peter Zijlstra
  8 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2022-05-19 21:27 UTC (permalink / raw)
  To: frederic, paulmck, rjw, x86; +Cc: linux-kernel, peterz, jpoimboe

Idle code is very like entry code in that RCU isn't available. As
such, add a little validation.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/irqflags.h |   11 ++++-------
 arch/x86/include/asm/mwait.h    |    2 +-
 include/linux/compiler_types.h  |    8 ++++++--
 include/linux/cpu.h             |    3 ---
 tools/objtool/check.c           |   15 ++++++++++++++-
 5 files changed, 25 insertions(+), 14 deletions(-)

--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -8,9 +8,6 @@
 
 #include <asm/nospec-branch.h>
 
-/* Provide __cpuidle; we can't safely include <linux/cpu.h> */
-#define __cpuidle __section(".cpuidle.text")
-
 /*
  * Interrupt control:
  */
@@ -45,13 +42,13 @@ static __always_inline void native_irq_e
 	asm volatile("sti": : :"memory");
 }
 
-static inline __cpuidle void native_safe_halt(void)
+static __always_inline void native_safe_halt(void)
 {
 	mds_idle_clear_cpu_buffers();
 	asm volatile("sti; hlt": : :"memory");
 }
 
-static inline __cpuidle void native_halt(void)
+static __always_inline void native_halt(void)
 {
 	mds_idle_clear_cpu_buffers();
 	asm volatile("hlt": : :"memory");
@@ -84,7 +81,7 @@ static __always_inline void arch_local_i
  * Used in the idle loop; sti takes one instruction cycle
  * to complete:
  */
-static inline __cpuidle void arch_safe_halt(void)
+static __always_inline void arch_safe_halt(void)
 {
 	native_safe_halt();
 }
@@ -93,7 +90,7 @@ static inline __cpuidle void arch_safe_h
  * Used when interrupts are already enabled or to
  * shutdown the processor:
  */
-static inline __cpuidle void halt(void)
+static __always_inline void halt(void)
 {
 	native_halt();
 }
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -104,7 +104,7 @@ static inline void __sti_mwait(unsigned
  * New with Core Duo processors, MWAIT can take some hints based on CPU
  * capability.
  */
-static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
+static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
 {
 	if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
 		if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -225,10 +225,14 @@ struct ftrace_likely_data {
 #endif
 
 /* Section for code which can't be instrumented at all */
-#define noinstr								\
-	noinline notrace __attribute((__section__(".noinstr.text")))	\
+#define __noinstr_section(section)					\
+	noinline notrace __attribute((__section__(section)))		\
 	__no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage
 
+#define noinstr __noinstr_section(".noinstr.text")
+
+#define __cpuidle __noinstr_section(".cpuidle.text")
+
 #endif /* __KERNEL__ */
 
 #endif /* __ASSEMBLY__ */
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -171,9 +171,6 @@ void __noreturn cpu_startup_entry(enum c
 
 void cpu_idle_poll_ctrl(bool enable);
 
-/* Attach to any functions which should be considered cpuidle. */
-#define __cpuidle	__section(".cpuidle.text")
-
 bool cpu_in_idle(unsigned long pc);
 
 void arch_cpu_idle(void);
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -373,7 +373,8 @@ static int decode_instructions(struct ob
 			sec->text = true;
 
 		if (!strcmp(sec->name, ".noinstr.text") ||
-		    !strcmp(sec->name, ".entry.text"))
+		    !strcmp(sec->name, ".entry.text") ||
+		    !strcmp(sec->name, ".cpuidle.text"))
 			sec->noinstr = true;
 
 		for (offset = 0; offset < sec->sh.sh_size; offset += insn->len) {
@@ -3077,6 +3078,12 @@ static inline bool noinstr_call_dest(str
 		return true;
 
 	/*
+	 * If the symbol is a static_call trampoline, we can't tell.
+	 */
+	if (func->static_call_tramp)
+		return true;
+
+	/*
 	 * The __ubsan_handle_*() calls are like WARN(), they only happen when
 	 * something 'BAD' happened. At the risk of taking the machine down,
 	 * let them proceed to get the message out.
@@ -3645,6 +3652,12 @@ static int validate_noinstr_sections(str
 	if (sec) {
 		warnings += validate_section(file, sec);
 		warnings += validate_unwind_hints(file, sec);
+	}
+
+	sec = find_section_by_name(file->elf, ".cpuidle.text");
+	if (sec) {
+		warnings += validate_section(file, sec);
+		warnings += validate_unwind_hints(file, sec);
 	}
 
 	return warnings;



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][PATCH 9/9] arch/idle: Change arch_cpu_idle() IRQ behaviour
  2022-05-19 21:27 [RFC][PATCH 0/9] Rework cpuidle vs instrumentation Peter Zijlstra
                   ` (7 preceding siblings ...)
  2022-05-19 21:27 ` [RFC][PATCH 8/9] objtool/idle: Validate __cpuidle code as noinstr Peter Zijlstra
@ 2022-05-19 21:27 ` Peter Zijlstra
  2022-05-19 22:03   ` Peter Zijlstra
  2022-05-23 10:02   ` Gautham R. Shenoy
  8 siblings, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2022-05-19 21:27 UTC (permalink / raw)
  To: frederic, paulmck, rjw, x86; +Cc: linux-kernel, peterz, jpoimboe

Current arch_cpu_idle() is called with IRQs disabled, but will return
with IRQs enabled.

However, the very first thing the generic code does after calling
arch_cpu_idle() is raw_local_irq_disable(). This means that
architectures that can idle with IRQs disabled end up doing a
pointless 'enable-disable' dance.

Therefore, push this IRQ disabling into the idle function, meaning
that those architectures can avoid the pointless IRQ state flipping.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/alpha/kernel/process.c      |    1 -
 arch/arc/kernel/process.c        |    3 +++
 arch/arm/kernel/process.c        |    1 -
 arch/arm/mach-gemini/board-dt.c  |    3 ++-
 arch/arm64/kernel/idle.c         |    1 -
 arch/csky/kernel/process.c       |    1 -
 arch/csky/kernel/smp.c           |    2 +-
 arch/h8300/kernel/process.c      |    1 +
 arch/hexagon/kernel/process.c    |    1 -
 arch/ia64/kernel/process.c       |    1 +
 arch/microblaze/kernel/process.c |    1 -
 arch/mips/kernel/idle.c          |    8 +++-----
 arch/nios2/kernel/process.c      |    1 -
 arch/openrisc/kernel/process.c   |    1 +
 arch/parisc/kernel/process.c     |    2 --
 arch/powerpc/kernel/idle.c       |    5 ++---
 arch/riscv/kernel/process.c      |    1 -
 arch/s390/kernel/idle.c          |    1 -
 arch/sh/kernel/idle.c            |    1 +
 arch/sparc/kernel/leon_pmc.c     |    4 ++++
 arch/sparc/kernel/process_32.c   |    1 -
 arch/sparc/kernel/process_64.c   |    3 ++-
 arch/um/kernel/process.c         |    1 -
 arch/x86/coco/tdx/tdx.c          |    3 +++
 arch/x86/kernel/process.c        |   14 ++++----------
 arch/xtensa/kernel/process.c     |    1 +
 kernel/sched/idle.c              |    2 --
 27 files changed, 29 insertions(+), 36 deletions(-)

--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -57,7 +57,6 @@ EXPORT_SYMBOL(pm_power_off);
 void arch_cpu_idle(void)
 {
 	wtint(0);
-	raw_local_irq_enable();
 }
 
 void arch_cpu_idle_dead(void)
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -114,6 +114,8 @@ void arch_cpu_idle(void)
 		"sleep %0	\n"
 		:
 		:"I"(arg)); /* can't be "r" has to be embedded const */
+
+	raw_local_irq_disable();
 }
 
 #else	/* ARC700 */
@@ -122,6 +124,7 @@ void arch_cpu_idle(void)
 {
 	/* sleep, but enable both set E1/E2 (levels of interrupts) before committing */
 	__asm__ __volatile__("sleep 0x3	\n");
+	raw_local_irq_disable();
 }
 
 #endif
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -78,7 +78,6 @@ void arch_cpu_idle(void)
 		arm_pm_idle();
 	else
 		cpu_do_idle();
-	raw_local_irq_enable();
 }
 
 void arch_cpu_idle_prepare(void)
--- a/arch/arm/mach-gemini/board-dt.c
+++ b/arch/arm/mach-gemini/board-dt.c
@@ -42,8 +42,9 @@ static void gemini_idle(void)
 	 */
 
 	/* FIXME: Enabling interrupts here is racy! */
-	local_irq_enable();
+	raw_local_irq_enable();
 	cpu_do_idle();
+	raw_local_irq_disable();
 }
 
 static void __init gemini_init_machine(void)
--- a/arch/arm64/kernel/idle.c
+++ b/arch/arm64/kernel/idle.c
@@ -42,5 +42,4 @@ void noinstr arch_cpu_idle(void)
 	 * tricks
 	 */
 	cpu_do_idle();
-	raw_local_irq_enable();
 }
--- a/arch/csky/kernel/process.c
+++ b/arch/csky/kernel/process.c
@@ -102,6 +102,5 @@ void arch_cpu_idle(void)
 #ifdef CONFIG_CPU_PM_STOP
 	asm volatile("stop\n");
 #endif
-	raw_local_irq_enable();
 }
 #endif
--- a/arch/csky/kernel/smp.c
+++ b/arch/csky/kernel/smp.c
@@ -314,7 +314,7 @@ void arch_cpu_idle_dead(void)
 	while (!secondary_stack)
 		arch_cpu_idle();
 
-	local_irq_disable();
+	raw_local_irq_disable();
 
 	asm volatile(
 		"mov	sp, %0\n"
--- a/arch/h8300/kernel/process.c
+++ b/arch/h8300/kernel/process.c
@@ -59,6 +59,7 @@ void arch_cpu_idle(void)
 {
 	raw_local_irq_enable();
 	__asm__("sleep");
+	raw_local_irq_disable();
 }
 
 void machine_restart(char *__unused)
--- a/arch/hexagon/kernel/process.c
+++ b/arch/hexagon/kernel/process.c
@@ -44,7 +44,6 @@ void arch_cpu_idle(void)
 {
 	__vmwait();
 	/*  interrupts wake us up, but irqs are still disabled */
-	raw_local_irq_enable();
 }
 
 /*
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -241,6 +241,7 @@ void arch_cpu_idle(void)
 		(*mark_idle)(1);
 
 	raw_safe_halt();
+	raw_local_irq_disable();
 
 	if (mark_idle)
 		(*mark_idle)(0);
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -138,5 +138,4 @@ int dump_fpu(struct pt_regs *regs, elf_f
 
 void arch_cpu_idle(void)
 {
-       raw_local_irq_enable();
 }
--- a/arch/mips/kernel/idle.c
+++ b/arch/mips/kernel/idle.c
@@ -33,13 +33,13 @@ static void __cpuidle r3081_wait(void)
 {
 	unsigned long cfg = read_c0_conf();
 	write_c0_conf(cfg | R30XX_CONF_HALT);
-	raw_local_irq_enable();
 }
 
 void __cpuidle r4k_wait(void)
 {
 	raw_local_irq_enable();
 	__r4k_wait();
+	raw_local_irq_disable();
 }
 
 /*
@@ -57,7 +57,6 @@ void __cpuidle r4k_wait_irqoff(void)
 		"	.set	arch=r4000	\n"
 		"	wait			\n"
 		"	.set	pop		\n");
-	raw_local_irq_enable();
 }
 
 /*
@@ -77,7 +76,6 @@ static void __cpuidle rm7k_wait_irqoff(v
 		"	wait						\n"
 		"	mtc0	$1, $12		# stalls until W stage	\n"
 		"	.set	pop					\n");
-	raw_local_irq_enable();
 }
 
 /*
@@ -103,6 +101,8 @@ static void __cpuidle au1k_wait(void)
 	"	nop				\n"
 	"	.set	pop			\n"
 	: : "r" (au1k_wait), "r" (c0status));
+
+	raw_local_irq_disable();
 }
 
 static int __initdata nowait;
@@ -245,8 +245,6 @@ void arch_cpu_idle(void)
 {
 	if (cpu_wait)
 		cpu_wait();
-	else
-		raw_local_irq_enable();
 }
 
 #ifdef CONFIG_CPU_IDLE
--- a/arch/nios2/kernel/process.c
+++ b/arch/nios2/kernel/process.c
@@ -33,7 +33,6 @@ EXPORT_SYMBOL(pm_power_off);
 
 void arch_cpu_idle(void)
 {
-	raw_local_irq_enable();
 }
 
 /*
--- a/arch/openrisc/kernel/process.c
+++ b/arch/openrisc/kernel/process.c
@@ -87,6 +87,7 @@ void arch_cpu_idle(void)
 	raw_local_irq_enable();
 	if (mfspr(SPR_UPR) & SPR_UPR_PMP)
 		mtspr(SPR_PMR, mfspr(SPR_PMR) | SPR_PMR_DME);
+	raw_local_irq_disable();
 }
 
 void (*pm_power_off) (void) = machine_power_off;
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -187,8 +187,6 @@ void arch_cpu_idle_dead(void)
 
 void __cpuidle arch_cpu_idle(void)
 {
-	raw_local_irq_enable();
-
 	/* nop on real hardware, qemu will idle sleep. */
 	asm volatile("or %%r10,%%r10,%%r10\n":::);
 }
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -51,10 +51,9 @@ void arch_cpu_idle(void)
 		 * Some power_save functions return with
 		 * interrupts enabled, some don't.
 		 */
-		if (irqs_disabled())
-			raw_local_irq_enable();
+		if (!irqs_disabled())
+			raw_local_irq_disable();
 	} else {
-		raw_local_irq_enable();
 		/*
 		 * Go into low thread priority and possibly
 		 * low power mode.
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -39,7 +39,6 @@ extern asmlinkage void ret_from_kernel_t
 void arch_cpu_idle(void)
 {
 	cpu_do_idle();
-	raw_local_irq_enable();
 }
 
 void __show_regs(struct pt_regs *regs)
--- a/arch/s390/kernel/idle.c
+++ b/arch/s390/kernel/idle.c
@@ -66,7 +66,6 @@ void arch_cpu_idle(void)
 	idle->idle_count++;
 	account_idle_time(cputime_to_nsecs(idle_time));
 	raw_write_seqcount_end(&idle->seqcount);
-	raw_local_irq_enable();
 }
 
 static ssize_t show_idle_count(struct device *dev,
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -25,6 +25,7 @@ void default_idle(void)
 	raw_local_irq_enable();
 	/* Isn't this racy ? */
 	cpu_sleep();
+	raw_local_irq_disable();
 	clear_bl_bit();
 }
 
--- a/arch/sparc/kernel/leon_pmc.c
+++ b/arch/sparc/kernel/leon_pmc.c
@@ -57,6 +57,8 @@ static void pmc_leon_idle_fixup(void)
 		"lda	[%0] %1, %%g0\n"
 		:
 		: "r"(address), "i"(ASI_LEON_BYPASS));
+
+	raw_local_irq_disable();
 }
 
 /*
@@ -70,6 +72,8 @@ static void pmc_leon_idle(void)
 
 	/* For systems without power-down, this will be no-op */
 	__asm__ __volatile__ ("wr	%g0, %asr19\n\t");
+
+	raw_local_irq_disable();
 }
 
 /* Install LEON Power Down function */
--- a/arch/sparc/kernel/process_32.c
+++ b/arch/sparc/kernel/process_32.c
@@ -71,7 +71,6 @@ void arch_cpu_idle(void)
 {
 	if (sparc_idle)
 		(*sparc_idle)();
-	raw_local_irq_enable();
 }
 
 /* XXX cli/sti -> local_irq_xxx here, check this works once SMP is fixed. */
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -59,7 +59,6 @@ void arch_cpu_idle(void)
 {
 	if (tlb_type != hypervisor) {
 		touch_nmi_watchdog();
-		raw_local_irq_enable();
 	} else {
 		unsigned long pstate;
 
@@ -90,6 +89,8 @@ void arch_cpu_idle(void)
 			"wrpr %0, %%g0, %%pstate"
 			: "=&r" (pstate)
 			: "i" (PSTATE_IE));
+
+		raw_local_irq_disable();
 	}
 }
 
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -216,7 +216,6 @@ void arch_cpu_idle(void)
 {
 	cpu_tasks[current_thread_info()->cpu].pid = os_getpid();
 	um_idle_sleep();
-	raw_local_irq_enable();
 }
 
 int __cant_sleep(void) {
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -178,6 +178,9 @@ void __cpuidle tdx_safe_halt(void)
 	 */
 	if (__halt(irq_disabled, do_sti))
 		WARN_ONCE(1, "HLT instruction emulation failed\n");
+
+	/* XXX I can't make sense of what @do_sti actually does */
+	raw_local_irq_disable();
 }
 
 static bool read_msr(struct pt_regs *regs)
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -699,6 +699,7 @@ EXPORT_SYMBOL(boot_option_idle_override)
 void __cpuidle default_idle(void)
 {
 	raw_safe_halt();
+	raw_local_irq_disable();
 }
 #if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
 EXPORT_SYMBOL(default_idle);
@@ -804,13 +805,7 @@ static void amd_e400_idle(void)
 
 	default_idle();
 
-	/*
-	 * The switch back from broadcast mode needs to be called with
-	 * interrupts disabled.
-	 */
-	raw_local_irq_disable();
 	tick_broadcast_exit();
-	raw_local_irq_enable();
 }
 
 /*
@@ -849,12 +844,11 @@ static __cpuidle void mwait_idle(void)
 		}
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
-		if (!need_resched())
+		if (!need_resched()) {
 			__sti_mwait(0, 0);
-		else
-			raw_local_irq_enable();
+			raw_local_irq_disable();
+		}
 	} else {
-		raw_local_irq_enable();
 	}
 	__current_clr_polling();
 }
--- a/arch/xtensa/kernel/process.c
+++ b/arch/xtensa/kernel/process.c
@@ -120,6 +120,7 @@ void coprocessor_flush_all(struct thread
 void arch_cpu_idle(void)
 {
 	platform_idle();
+	raw_local_irq_disable();
 }
 
 /*
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -79,7 +79,6 @@ void __weak arch_cpu_idle_dead(void) { }
 void __weak arch_cpu_idle(void)
 {
 	cpu_idle_force_poll = 1;
-	raw_local_irq_enable();
 }
 
 /**
@@ -96,7 +95,6 @@ void __cpuidle default_idle_call(void)
 
 		cpuidle_rcu_enter();
 		arch_cpu_idle();
-		raw_local_irq_disable();
 		cpuidle_rcu_exit();
 
 		start_critical_timings();



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 9/9] arch/idle: Change arch_cpu_idle() IRQ behaviour
  2022-05-19 21:27 ` [RFC][PATCH 9/9] arch/idle: Change arch_cpu_idle() IRQ behaviour Peter Zijlstra
@ 2022-05-19 22:03   ` Peter Zijlstra
  2022-05-20  2:20     ` Kirill A. Shutemov
  2022-05-23 10:02   ` Gautham R. Shenoy
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2022-05-19 22:03 UTC (permalink / raw)
  To: frederic, paulmck, rjw, x86, kirill.shutemov; +Cc: linux-kernel, jpoimboe


On Thu, May 19, 2022 at 11:27:59PM +0200, Peter Zijlstra wrote:
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -178,6 +178,9 @@ void __cpuidle tdx_safe_halt(void)
>  	 */
>  	if (__halt(irq_disabled, do_sti))
>  		WARN_ONCE(1, "HLT instruction emulation failed\n");
> +
> +	/* XXX I can't make sense of what @do_sti actually does */
> +	raw_local_irq_disable();
>  }
>  

Kirill, Dave says I should prod you :-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 9/9] arch/idle: Change arch_cpu_idle() IRQ behaviour
  2022-05-19 22:03   ` Peter Zijlstra
@ 2022-05-20  2:20     ` Kirill A. Shutemov
  2022-05-20  7:06       ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Kirill A. Shutemov @ 2022-05-20  2:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: frederic, paulmck, rjw, x86, linux-kernel, jpoimboe

On Fri, May 20, 2022 at 12:03:49AM +0200, Peter Zijlstra wrote:
> 
> On Thu, May 19, 2022 at 11:27:59PM +0200, Peter Zijlstra wrote:
> > --- a/arch/x86/coco/tdx/tdx.c
> > +++ b/arch/x86/coco/tdx/tdx.c
> > @@ -178,6 +178,9 @@ void __cpuidle tdx_safe_halt(void)
> >  	 */
> >  	if (__halt(irq_disabled, do_sti))
> >  		WARN_ONCE(1, "HLT instruction emulation failed\n");
> > +
> > +	/* XXX I can't make sense of what @do_sti actually does */
> > +	raw_local_irq_disable();
> >  }
> >  
> 
> Kirill, Dave says I should prod you :-)

It calls STI just before doing TDCALL that requests HLT.
See comment above $TDX_HCALL_ISSUE_STI usage in __tdx_hypercall()[1].

__halt(do_sti == true) matches native_safe_halt() semantics (or suppose
to) and __halt(do_sti == false) corresponds to native_halt().

For context, see Section 3.8 in GHCI[2]

[1] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/tree/arch/x86/coco/tdx/tdcall.S?h=x86/tdx#n151
[2] https://www.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-guest-hypervisor-communication-interface-1.0-344426-002.pdf

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 9/9] arch/idle: Change arch_cpu_idle() IRQ behaviour
  2022-05-20  2:20     ` Kirill A. Shutemov
@ 2022-05-20  7:06       ` Peter Zijlstra
  2022-05-20 10:13         ` Kirill A. Shutemov
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2022-05-20  7:06 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: frederic, paulmck, rjw, x86, linux-kernel, jpoimboe

On Fri, May 20, 2022 at 05:20:52AM +0300, Kirill A. Shutemov wrote:
> On Fri, May 20, 2022 at 12:03:49AM +0200, Peter Zijlstra wrote:
> > 
> > On Thu, May 19, 2022 at 11:27:59PM +0200, Peter Zijlstra wrote:
> > > --- a/arch/x86/coco/tdx/tdx.c
> > > +++ b/arch/x86/coco/tdx/tdx.c
> > > @@ -178,6 +178,9 @@ void __cpuidle tdx_safe_halt(void)
> > >  	 */
> > >  	if (__halt(irq_disabled, do_sti))
> > >  		WARN_ONCE(1, "HLT instruction emulation failed\n");
> > > +
> > > +	/* XXX I can't make sense of what @do_sti actually does */
> > > +	raw_local_irq_disable();
> > >  }
> > >  
> > 
> > Kirill, Dave says I should prod you :-)
> 
> It calls STI just before doing TDCALL that requests HLT.
> See comment above $TDX_HCALL_ISSUE_STI usage in __tdx_hypercall()[1].

Yes, it says that, but it's useless information since it doesn't
actually tell me the behaviour.

What I'm interested in is the behavour of the hypercall when:
.irq_disabled=false, .do_sti=false

From what I can tell, irq_disabled=false should have the hypercall wake
on interrupt, do_sti=false should have it not enable interrupts.

But what does it actually do ? Because HLT without STI is a dead
machine, but this hypercall looks more like mwait with the irq_disabled
argument...

> 
> __halt(do_sti == true) matches native_safe_halt() semantics (or suppose
> to) and __halt(do_sti == false) corresponds to native_halt().
> 
> For context, see Section 3.8 in GHCI[2]
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/tree/arch/x86/coco/tdx/tdcall.S?h=x86/tdx#n151
> [2] https://www.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-guest-hypervisor-communication-interface-1.0-344426-002.pdf

Yeah, that stuff is unreadable garbage. Not going to waste time trying
to make sense of it again.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 9/9] arch/idle: Change arch_cpu_idle() IRQ behaviour
  2022-05-20  7:06       ` Peter Zijlstra
@ 2022-05-20 10:13         ` Kirill A. Shutemov
  2022-05-20 12:58           ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Kirill A. Shutemov @ 2022-05-20 10:13 UTC (permalink / raw)
  To: Peter Zijlstra, Isaku Yamahata
  Cc: Kirill A. Shutemov, frederic, paulmck, rjw, x86, linux-kernel, jpoimboe

On Fri, May 20, 2022 at 09:06:14AM +0200, Peter Zijlstra wrote:
> On Fri, May 20, 2022 at 05:20:52AM +0300, Kirill A. Shutemov wrote:
> > On Fri, May 20, 2022 at 12:03:49AM +0200, Peter Zijlstra wrote:
> > > 
> > > On Thu, May 19, 2022 at 11:27:59PM +0200, Peter Zijlstra wrote:
> > > > --- a/arch/x86/coco/tdx/tdx.c
> > > > +++ b/arch/x86/coco/tdx/tdx.c
> > > > @@ -178,6 +178,9 @@ void __cpuidle tdx_safe_halt(void)
> > > >  	 */
> > > >  	if (__halt(irq_disabled, do_sti))
> > > >  		WARN_ONCE(1, "HLT instruction emulation failed\n");
> > > > +
> > > > +	/* XXX I can't make sense of what @do_sti actually does */
> > > > +	raw_local_irq_disable();
> > > >  }
> > > >  
> > > 
> > > Kirill, Dave says I should prod you :-)
> > 
> > It calls STI just before doing TDCALL that requests HLT.
> > See comment above $TDX_HCALL_ISSUE_STI usage in __tdx_hypercall()[1].
> 
> Yes, it says that, but it's useless information since it doesn't
> actually tell me the behaviour.
> 
> What I'm interested in is the behavour of the hypercall when:
> .irq_disabled=false, .do_sti=false
> 
> From what I can tell, irq_disabled=false should have the hypercall wake
> on interrupt, do_sti=false should have it not enable interrupts.
> 
> But what does it actually do ? Because HLT without STI is a dead
> machine, but this hypercall looks more like mwait with the irq_disabled
> argument...

+Isaku.

So you want to call call the HLT hypercall with .irq_disabled=false and
.do_sti=false, but actual RFLAGS.IF in the guest is 0 and avoid CLI on
wake up expecting it to be cleared already, right?

My reading of the spec is "don't do that". But actual behaviour is up to
VMM and TDX module implementation. VMM doens't have access to the guest
register file, so it *may* work, I donno.

Sorry for not being helpful :/

Isaku, maybe you can clarify this?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 9/9] arch/idle: Change arch_cpu_idle() IRQ behaviour
  2022-05-20 10:13         ` Kirill A. Shutemov
@ 2022-05-20 12:58           ` Peter Zijlstra
  2022-05-24 14:55             ` Isaku Yamahata
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2022-05-20 12:58 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Isaku Yamahata, Kirill A. Shutemov, frederic, paulmck, rjw, x86,
	linux-kernel, jpoimboe

On Fri, May 20, 2022 at 01:13:22PM +0300, Kirill A. Shutemov wrote:

> So you want to call call the HLT hypercall with .irq_disabled=false and
> .do_sti=false, but actual RFLAGS.IF in the guest is 0 and avoid CLI on
> wake up expecting it to be cleared already, right?

Yep, just like MWAIT can, avoids pointless IF flipping.

> My reading of the spec is "don't do that". But actual behaviour is up to
> VMM and TDX module implementation. VMM doens't have access to the guest
> register file, so it *may* work, I donno.

Yeah, it totally *can* work, but I've no idea if they done the right
thing.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 5/9] rcu: Fix rcu_idle_exit()
  2022-05-19 21:27 ` [RFC][PATCH 5/9] rcu: Fix rcu_idle_exit() Peter Zijlstra
@ 2022-05-20 21:00   ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2022-05-20 21:00 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: frederic, rjw, x86, linux-kernel, jpoimboe

On Thu, May 19, 2022 at 11:27:55PM +0200, Peter Zijlstra wrote:
> Current rcu_idle_exit() is terminally broken because it uses
> local_irq_{save,restore}(), which are traced which uses RCU.
> 
> However, now that all the callers are sure to have IRQs disabled, we
> can remove these calls.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

This looks good to me.  If there are any callers with IRQs still
enabled, the lockdep_assert_irqs_disabled() should catch them.
And yes, after looking at the definition, I agree that it is just
fine to invoke lockdep_assert_irqs_disabled() from noinstr code.
The underlying __WARN_printf() might need an RCU_NONIDLE() or
equivalent, but if so that is a separate issue.

Acked-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  kernel/rcu/tree.c |    9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -659,7 +659,7 @@ static noinstr void rcu_eqs_enter(bool u
>   * If you add or remove a call to rcu_idle_enter(), be sure to test with
>   * CONFIG_RCU_EQS_DEBUG=y.
>   */
> -void rcu_idle_enter(void)
> +void noinstr rcu_idle_enter(void)
>  {
>  	lockdep_assert_irqs_disabled();
>  	rcu_eqs_enter(false);
> @@ -896,13 +896,10 @@ static void noinstr rcu_eqs_exit(bool us
>   * If you add or remove a call to rcu_idle_exit(), be sure to test with
>   * CONFIG_RCU_EQS_DEBUG=y.
>   */
> -void rcu_idle_exit(void)
> +void noinstr rcu_idle_exit(void)
>  {
> -	unsigned long flags;
> -
> -	local_irq_save(flags);
> +	lockdep_assert_irqs_disabled();
>  	rcu_eqs_exit(false);
> -	local_irq_restore(flags);
>  }
>  EXPORT_SYMBOL_GPL(rcu_idle_exit);
>  
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 4/9] idle: Fix rcu_idle_*() usage
  2022-05-19 21:27 ` [RFC][PATCH 4/9] idle: Fix rcu_idle_*() usage Peter Zijlstra
@ 2022-05-20 23:29   ` Hillf Danton
  2022-05-23  9:46   ` Gautham R. Shenoy
  1 sibling, 0 replies; 24+ messages in thread
From: Hillf Danton @ 2022-05-20 23:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: frederic, paulmck, rjw, x86, linux-mm, linux-kernel, jpoimboe

On Thu, 19 May 2022 23:27:54 +0200 Peter Zijlstra wrote:
> @@ -150,12 +151,12 @@ static void enter_s2idle_proper(struct c
>  	 */
>  	stop_critical_timings();
>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> -		rcu_idle_enter();
> +		cpuidle_rcu_enter();
>  	target_state->enter_s2idle(dev, drv, index);
>  	if (WARN_ON_ONCE(!irqs_disabled()))
> -		local_irq_disable();
> +		raw_local_irq_disable();
>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> -		rcu_idle_exit();
> +		cpuidle_rcu_enter();

Why does it need to enter once more instead of exit, if it is not typo?

Hillf
>  	tick_unfreeze();
>  	start_critical_timings();
>  
> @@ -233,14 +234,14 @@ int cpuidle_enter_state(struct cpuidle_d
>  
>  	stop_critical_timings();
>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> -		rcu_idle_enter();
> +		cpuidle_rcu_enter();
>  
>  	entered_state = target_state->enter(dev, drv, index);
>  	if (WARN_ON_ONCE(!irqs_disabled()))
>  		raw_local_irq_disable();
>  
>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> -		rcu_idle_exit();
> +		cpuidle_rcu_exit();

This looks like the correct sequence in sanity.

>  	start_critical_timings();
>  
>  	sched_clock_idle_wakeup_event();


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 4/9] idle: Fix rcu_idle_*() usage
  2022-05-19 21:27 ` [RFC][PATCH 4/9] idle: Fix rcu_idle_*() usage Peter Zijlstra
  2022-05-20 23:29   ` Hillf Danton
@ 2022-05-23  9:46   ` Gautham R. Shenoy
  2022-05-31  9:33     ` Peter Zijlstra
  1 sibling, 1 reply; 24+ messages in thread
From: Gautham R. Shenoy @ 2022-05-23  9:46 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: frederic, paulmck, rjw, x86, linux-kernel, jpoimboe

On Thu, May 19, 2022 at 11:27:54PM +0200, Peter Zijlstra wrote:
> The whole disable-RCU, enable-IRQS dance is very intricate since
> changing IRQ state is traced, which depends on RCU.
> 
> Add two helpers for the cpuidle case that mirror the entry code.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

[...]
>  
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -13,6 +13,7 @@
>  #include <linux/mutex.h>
>  #include <linux/sched.h>
>  #include <linux/sched/clock.h>
> +#include <linux/sched/idle.h>
>  #include <linux/notifier.h>
>  #include <linux/pm_qos.h>
>  #include <linux/cpu.h>
> @@ -150,12 +151,12 @@ static void enter_s2idle_proper(struct c
>  	 */
>  	stop_critical_timings();
>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> -		rcu_idle_enter();
> +		cpuidle_rcu_enter();
>  	target_state->enter_s2idle(dev, drv, index);
>  	if (WARN_ON_ONCE(!irqs_disabled()))
> -		local_irq_disable();
> +		raw_local_irq_disable();
>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> -		rcu_idle_exit();
> +		cpuidle_rcu_enter();

Shouldn't this be cpuidle_rcu_exit() ?


>  	tick_unfreeze();
>  	start_critical_timings();
>  

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 9/9] arch/idle: Change arch_cpu_idle() IRQ behaviour
  2022-05-19 21:27 ` [RFC][PATCH 9/9] arch/idle: Change arch_cpu_idle() IRQ behaviour Peter Zijlstra
  2022-05-19 22:03   ` Peter Zijlstra
@ 2022-05-23 10:02   ` Gautham R. Shenoy
  1 sibling, 0 replies; 24+ messages in thread
From: Gautham R. Shenoy @ 2022-05-23 10:02 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: frederic, paulmck, rjw, x86, linux-kernel, jpoimboe

On Thu, May 19, 2022 at 11:27:59PM +0200, Peter Zijlstra wrote:
> Current arch_cpu_idle() is called with IRQs disabled, but will return
> with IRQs enabled.
> 
> However, the very first thing the generic code does after calling
> arch_cpu_idle() is raw_local_irq_disable(). This means that
> architectures that can idle with IRQs disabled end up doing a
> pointless 'enable-disable' dance.
> 
> Therefore, push this IRQ disabling into the idle function, meaning
> that those architectures can avoid the pointless IRQ state flipping.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

[...]

> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -699,6 +699,7 @@ EXPORT_SYMBOL(boot_option_idle_override)
>  void __cpuidle default_idle(void)
>  {
>  	raw_safe_halt();
> +	raw_local_irq_disable();
>  }
>  #if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
>  EXPORT_SYMBOL(default_idle);
> @@ -804,13 +805,7 @@ static void amd_e400_idle(void)
>  
>  	default_idle();
>  
> -	/*
> -	 * The switch back from broadcast mode needs to be called with
> -	 * interrupts disabled.
> -	 */
> -	raw_local_irq_disable();
>  	tick_broadcast_exit();
> -	raw_local_irq_enable();
>  }
>  
>  /*
> @@ -849,12 +844,11 @@ static __cpuidle void mwait_idle(void)
>  		}
>  
>  		__monitor((void *)&current_thread_info()->flags, 0, 0);
> -		if (!need_resched())
> +		if (!need_resched()) {
>  			__sti_mwait(0, 0);
> -		else
> -			raw_local_irq_enable();
> +			raw_local_irq_disable();
> +		}
>  	} else {
> -		raw_local_irq_enable();
>  	}


We don't need the outer else clause anymore.

>  	__current_clr_polling();
>  }


--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 9/9] arch/idle: Change arch_cpu_idle() IRQ behaviour
  2022-05-20 12:58           ` Peter Zijlstra
@ 2022-05-24 14:55             ` Isaku Yamahata
  0 siblings, 0 replies; 24+ messages in thread
From: Isaku Yamahata @ 2022-05-24 14:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill A. Shutemov, Isaku Yamahata, Kirill A. Shutemov, frederic,
	paulmck, rjw, x86, linux-kernel, jpoimboe, isaku.yamahata

On Fri, May 20, 2022 at 02:58:19PM +0200,
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, May 20, 2022 at 01:13:22PM +0300, Kirill A. Shutemov wrote:
> 
> > So you want to call call the HLT hypercall with .irq_disabled=false and
> > .do_sti=false, but actual RFLAGS.IF in the guest is 0 and avoid CLI on
> > wake up expecting it to be cleared already, right?
> 
> Yep, just like MWAIT can, avoids pointless IF flipping.
> 
> > My reading of the spec is "don't do that". But actual behaviour is up to
> > VMM and TDX module implementation. VMM doens't have access to the guest
> > register file, so it *may* work, I donno.
> 
> Yeah, it totally *can* work, but I've no idea if they done the right
> thing.

There are two cases when interrupt arrives.

- If interrupts arrives after the CPU start executing VMM (or the TDX module),
  VMM can know if interrupt for vCPU arrives. VMM will unblock vcpu scheduling.
  The HLT hypercall returns back to guest.

- If interrupts arrives and vcpu recognizes it before the CPU starts executing
  VMM (or TDX module), the interrupt request is recorded in vRVI (VMCS.RVI)
  due to vRFLAGS.IF=0.  After that, CPU exits from guest to VMM due to HLT
  hypercall.
  Before KVM blocking vcpu scheduling, due to irq_disable=false TDX KVM checks
  if deliverable interrupt events is pending by TDX SEAMCALL (because CPU state
  is protected, VMM can't peek vRVI and vPPR directly.  Note that vRFLAGS.IF is
  ignored in this check).  If vcpu has deliverable pending interrupt, HLT
  hypercall returns.

  Anyway this scenario isn't tested, I need to test it.
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 2/9] x86/idle: Replace x86_idle with a static_call
  2022-05-19 21:27 ` [RFC][PATCH 2/9] x86/idle: Replace x86_idle with a static_call Peter Zijlstra
@ 2022-05-30 11:07   ` Frederic Weisbecker
  0 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2022-05-30 11:07 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: paulmck, rjw, x86, linux-kernel, jpoimboe

On Thu, May 19, 2022 at 11:27:52PM +0200, Peter Zijlstra wrote:
> Typical boot time setup; no need to suffer an indirect call for that.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 3/9] cpuidle: Move IRQ state validation
  2022-05-19 21:27 ` [RFC][PATCH 3/9] cpuidle: Move IRQ state validation Peter Zijlstra
@ 2022-05-30 11:36   ` Frederic Weisbecker
  2022-05-30 12:00     ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2022-05-30 11:36 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: paulmck, rjw, x86, linux-kernel, jpoimboe

On Thu, May 19, 2022 at 11:27:53PM +0200, Peter Zijlstra wrote:
> Make cpuidle_enter_state() consistent with the s2idle variant and
> verify ->enter() always returns with interrupts disabled.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  drivers/cpuidle/cpuidle.c |   10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -234,7 +234,11 @@ int cpuidle_enter_state(struct cpuidle_d
>  	stop_critical_timings();
>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
>  		rcu_idle_enter();
> +
>  	entered_state = target_state->enter(dev, drv, index);
> +	if (WARN_ON_ONCE(!irqs_disabled()))
> +		raw_local_irq_disable();

So it means that idle functions are supposed to return with IRQs disabled
without tracing, right? I can see that at least acpi_safe_halt() is using
the non-raw local_irq_disable().

> +
>  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
>  		rcu_idle_exit();
>  	start_critical_timings();
> @@ -246,12 +250,8 @@ int cpuidle_enter_state(struct cpuidle_d
>  	/* The cpu is no longer idle or about to enter idle. */
>  	sched_idle_set_state(NULL);
>  
> -	if (broadcast) {
> -		if (WARN_ON_ONCE(!irqs_disabled()))
> -			local_irq_disable();
> -
> +	if (broadcast)
>  		tick_broadcast_exit();
> -	}
>  
>  	if (!cpuidle_state_is_coupled(drv, index))
>  		local_irq_enable();
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 3/9] cpuidle: Move IRQ state validation
  2022-05-30 11:36   ` Frederic Weisbecker
@ 2022-05-30 12:00     ` Peter Zijlstra
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2022-05-30 12:00 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: paulmck, rjw, x86, linux-kernel, jpoimboe

On Mon, May 30, 2022 at 01:36:40PM +0200, Frederic Weisbecker wrote:
> On Thu, May 19, 2022 at 11:27:53PM +0200, Peter Zijlstra wrote:
> > Make cpuidle_enter_state() consistent with the s2idle variant and
> > verify ->enter() always returns with interrupts disabled.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  drivers/cpuidle/cpuidle.c |   10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> > @@ -234,7 +234,11 @@ int cpuidle_enter_state(struct cpuidle_d
> >  	stop_critical_timings();
> >  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> >  		rcu_idle_enter();
> > +
> >  	entered_state = target_state->enter(dev, drv, index);
> > +	if (WARN_ON_ONCE(!irqs_disabled()))
> > +		raw_local_irq_disable();
> 
> So it means that idle functions are supposed to return with IRQs disabled
> without tracing, right? I can see that at least acpi_safe_halt() is using
> the non-raw local_irq_disable().

Yeah, I might need to re-order this. 0day also complained about that.
I'll need to find a moment to re-audit this and put it in the right
place.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 4/9] idle: Fix rcu_idle_*() usage
  2022-05-23  9:46   ` Gautham R. Shenoy
@ 2022-05-31  9:33     ` Peter Zijlstra
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2022-05-31  9:33 UTC (permalink / raw)
  To: Gautham R. Shenoy; +Cc: frederic, paulmck, rjw, x86, linux-kernel, jpoimboe

On Mon, May 23, 2022 at 03:16:11PM +0530, Gautham R. Shenoy wrote:
> On Thu, May 19, 2022 at 11:27:54PM +0200, Peter Zijlstra wrote:
> > The whole disable-RCU, enable-IRQS dance is very intricate since
> > changing IRQ state is traced, which depends on RCU.
> > 
> > Add two helpers for the cpuidle case that mirror the entry code.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> 
> [...]
> >  
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> > @@ -13,6 +13,7 @@
> >  #include <linux/mutex.h>
> >  #include <linux/sched.h>
> >  #include <linux/sched/clock.h>
> > +#include <linux/sched/idle.h>
> >  #include <linux/notifier.h>
> >  #include <linux/pm_qos.h>
> >  #include <linux/cpu.h>
> > @@ -150,12 +151,12 @@ static void enter_s2idle_proper(struct c
> >  	 */
> >  	stop_critical_timings();
> >  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> > -		rcu_idle_enter();
> > +		cpuidle_rcu_enter();
> >  	target_state->enter_s2idle(dev, drv, index);
> >  	if (WARN_ON_ONCE(!irqs_disabled()))
> > -		local_irq_disable();
> > +		raw_local_irq_disable();
> >  	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
> > -		rcu_idle_exit();
> > +		cpuidle_rcu_enter();
> 
> Shouldn't this be cpuidle_rcu_exit() ?

Yes, very much so... soon I should buy a new supply of brown paper bags
:-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-05-31  9:34 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-19 21:27 [RFC][PATCH 0/9] Rework cpuidle vs instrumentation Peter Zijlstra
2022-05-19 21:27 ` [RFC][PATCH 1/9] x86/perf/amd: Remove tracing from perf_lopwr_cb() Peter Zijlstra
2022-05-19 21:27 ` [RFC][PATCH 2/9] x86/idle: Replace x86_idle with a static_call Peter Zijlstra
2022-05-30 11:07   ` Frederic Weisbecker
2022-05-19 21:27 ` [RFC][PATCH 3/9] cpuidle: Move IRQ state validation Peter Zijlstra
2022-05-30 11:36   ` Frederic Weisbecker
2022-05-30 12:00     ` Peter Zijlstra
2022-05-19 21:27 ` [RFC][PATCH 4/9] idle: Fix rcu_idle_*() usage Peter Zijlstra
2022-05-20 23:29   ` Hillf Danton
2022-05-23  9:46   ` Gautham R. Shenoy
2022-05-31  9:33     ` Peter Zijlstra
2022-05-19 21:27 ` [RFC][PATCH 5/9] rcu: Fix rcu_idle_exit() Peter Zijlstra
2022-05-20 21:00   ` Paul E. McKenney
2022-05-19 21:27 ` [RFC][PATCH 6/9] acpi_idle: Remove tracing Peter Zijlstra
2022-05-19 21:27 ` [RFC][PATCH 7/9] cpuidle: Annotate poll_idle() Peter Zijlstra
2022-05-19 21:27 ` [RFC][PATCH 8/9] objtool/idle: Validate __cpuidle code as noinstr Peter Zijlstra
2022-05-19 21:27 ` [RFC][PATCH 9/9] arch/idle: Change arch_cpu_idle() IRQ behaviour Peter Zijlstra
2022-05-19 22:03   ` Peter Zijlstra
2022-05-20  2:20     ` Kirill A. Shutemov
2022-05-20  7:06       ` Peter Zijlstra
2022-05-20 10:13         ` Kirill A. Shutemov
2022-05-20 12:58           ` Peter Zijlstra
2022-05-24 14:55             ` Isaku Yamahata
2022-05-23 10:02   ` Gautham R. Shenoy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.