All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states
@ 2023-11-24 22:32 Frederic Weisbecker
  2023-11-24 22:32 ` [PATCH 1/7] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram Frederic Weisbecker
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2023-11-24 22:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Rafael J . Wysocki, Daniel Lezcano,
	linux-pm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Peter Zijlstra

The TIF_NR_POLLING handling against polling idle states (mwait and also
software polling) is a bit messy, with quite some wasted cycles spent
on useless atomic operations. This is a try to consolidate this state
handling from the cpuidle core.

Frederic Weisbecker (4):
  x86: Add a comment about the "magic" behind shadow sti before mwait
  cpuidle: Remove unnecessary current_clr_polling_and_test() from
    haltpoll
  cpuidle: s/CPUIDLE_FLAG_POLLING/CPUIDLE_FLAG_POLLING_SOFT
  cpuidle: Handle TIF_NR_POLLING on behalf of software polling idle
    states

Peter Zijlstra (3):
  x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram
  cpuidle: Introduce CPUIDLE_FLAG_POLLING_HARD
  cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_POLLING_HARD
    states

 Documentation/driver-api/pm/cpuidle.rst |  2 +-
 arch/x86/include/asm/mwait.h            | 23 +++++++++++---
 drivers/acpi/processor_idle.c           |  3 ++
 drivers/cpuidle/cpuidle-haltpoll.c      |  5 +---
 drivers/cpuidle/cpuidle-powernv.c       | 12 +-------
 drivers/cpuidle/cpuidle-pseries.c       | 15 ++--------
 drivers/cpuidle/cpuidle.c               | 22 +++++++++++++-
 drivers/cpuidle/governors/ladder.c      |  4 +--
 drivers/cpuidle/governors/menu.c        |  8 ++---
 drivers/cpuidle/governors/teo.c         |  8 ++---
 drivers/cpuidle/poll_state.c            | 32 ++++++++------------
 drivers/idle/intel_idle.c               | 24 +++++++--------
 include/linux/cpuidle.h                 |  3 +-
 include/linux/sched/idle.h              |  7 ++++-
 kernel/sched/idle.c                     | 40 +++++++++----------------
 15 files changed, 104 insertions(+), 104 deletions(-)

-- 
2.42.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/7] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram
  2023-11-24 22:32 [PATCH 0/7] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
@ 2023-11-24 22:32 ` Frederic Weisbecker
  2023-11-24 22:32 ` [PATCH 2/7] x86: Add a comment about the "magic" behind shadow sti before mwait Frederic Weisbecker
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2023-11-24 22:32 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Rafael J . Wysocki, Daniel Lezcano, linux-pm,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Frederic Weisbecker

From: Peter Zijlstra <peterz@infradead.org>

intel_idle_irq() re-enables IRQs very early. As a result, an interrupt
may fire before mwait() is eventually called. If such an interrupt queues
a timer, it may go unnoticed until mwait returns and the idle loop
handles the tick re-evaluation. And monitoring TIF_NEED_RESCHED doesn't
help because a local timer enqueue doesn't set that flag.

The issue is mitigated by the fact that this idle handler is only invoked
for shallow C-states when, presumably, the next tick is supposed to be
close enough. There may still be rare cases though when the next tick
is far away and the selected C-state is shallow, resulting in a timer
getting ignored for a while.

Fix this with using sti_mwait() whose IRQ-reenablement only triggers
upon calling mwait(), dealing with the race while keeping the interrupt
latency within acceptable bounds.

Fixes: c227233ad64c (intel_idle: enable interrupts before C1 on Xeons)
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 arch/x86/include/asm/mwait.h | 11 +++++++++--
 drivers/idle/intel_idle.c    | 19 +++++++------------
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 778df05f8539..bae83810505b 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -115,8 +115,15 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo
 		}
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
-		if (!need_resched())
-			__mwait(eax, ecx);
+
+		if (!need_resched()) {
+			if (ecx & 1) {
+				__mwait(eax, ecx);
+			} else {
+				__sti_mwait(eax, ecx);
+				raw_local_irq_disable();
+			}
+		}
 	}
 	current_clr_polling();
 }
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index dcda0afecfc5..3e01a6b23e75 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -131,11 +131,12 @@ static unsigned int mwait_substates __initdata;
 #define MWAIT2flg(eax) ((eax & 0xFF) << 24)
 
 static __always_inline int __intel_idle(struct cpuidle_device *dev,
-					struct cpuidle_driver *drv, int index)
+					struct cpuidle_driver *drv,
+					int index, bool irqoff)
 {
 	struct cpuidle_state *state = &drv->states[index];
 	unsigned long eax = flg2MWAIT(state->flags);
-	unsigned long ecx = 1; /* break on interrupt flag */
+	unsigned long ecx = 1*irqoff; /* break on interrupt flag */
 
 	mwait_idle_with_hints(eax, ecx);
 
@@ -159,19 +160,13 @@ static __always_inline int __intel_idle(struct cpuidle_device *dev,
 static __cpuidle int intel_idle(struct cpuidle_device *dev,
 				struct cpuidle_driver *drv, int index)
 {
-	return __intel_idle(dev, drv, index);
+	return __intel_idle(dev, drv, index, true);
 }
 
 static __cpuidle int intel_idle_irq(struct cpuidle_device *dev,
 				    struct cpuidle_driver *drv, int index)
 {
-	int ret;
-
-	raw_local_irq_enable();
-	ret = __intel_idle(dev, drv, index);
-	raw_local_irq_disable();
-
-	return ret;
+	return __intel_idle(dev, drv, index, false);
 }
 
 static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev,
@@ -184,7 +179,7 @@ static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev,
 	if (smt_active)
 		__update_spec_ctrl(0);
 
-	ret = __intel_idle(dev, drv, index);
+	ret = __intel_idle(dev, drv, index, true);
 
 	if (smt_active)
 		__update_spec_ctrl(spec_ctrl);
@@ -196,7 +191,7 @@ static __cpuidle int intel_idle_xstate(struct cpuidle_device *dev,
 				       struct cpuidle_driver *drv, int index)
 {
 	fpu_idle_fpregs();
-	return __intel_idle(dev, drv, index);
+	return __intel_idle(dev, drv, index, true);
 }
 
 /**
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/7] x86: Add a comment about the "magic" behind shadow sti before mwait
  2023-11-24 22:32 [PATCH 0/7] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
  2023-11-24 22:32 ` [PATCH 1/7] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram Frederic Weisbecker
@ 2023-11-24 22:32 ` Frederic Weisbecker
  2023-11-24 22:32 ` [PATCH 3/7] cpuidle: Remove unnecessary current_clr_polling_and_test() from haltpoll Frederic Weisbecker
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2023-11-24 22:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Rafael J . Wysocki, Daniel Lezcano,
	linux-pm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Peter Zijlstra

Add a note to make sure we never miss and break the requirements behind
it.

Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 arch/x86/include/asm/mwait.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index bae83810505b..920426d691ce 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -87,6 +87,15 @@ static __always_inline void __mwaitx(unsigned long eax, unsigned long ebx,
 		     :: "a" (eax), "b" (ebx), "c" (ecx));
 }
 
+/*
+ * Re-enable interrupts right upon calling mwait in such a way that
+ * no interrupt can fire _before_ the execution of mwait, ie: no
+ * instruction must be placed between "sti" and "mwait".
+ *
+ * This is necessary because if an interrupt queues a timer before
+ * executing mwait, it would otherwise go unnoticed and the next tick
+ * would not be reprogrammed accordingly before mwait ever wakes up.
+ */
 static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 {
 	mds_idle_clear_cpu_buffers();
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/7] cpuidle: Remove unnecessary current_clr_polling_and_test() from haltpoll
  2023-11-24 22:32 [PATCH 0/7] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
  2023-11-24 22:32 ` [PATCH 1/7] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram Frederic Weisbecker
  2023-11-24 22:32 ` [PATCH 2/7] x86: Add a comment about the "magic" behind shadow sti before mwait Frederic Weisbecker
@ 2023-11-24 22:32 ` Frederic Weisbecker
  2023-11-24 22:32 ` [PATCH 4/7] cpuidle: s/CPUIDLE_FLAG_POLLING/CPUIDLE_FLAG_POLLING_SOFT Frederic Weisbecker
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2023-11-24 22:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Rafael J . Wysocki, Daniel Lezcano,
	linux-pm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Peter Zijlstra, Marcelo Tosatti

When cpuidle drivers ->enter() callback are called, the TIF_NR_POLLING
flag is cleared already and TIF_NEED_RESCHED checked by call_cpuidle().

Therefore calling current_clr_polling_and_test() is redundant here and
further setting of TIF_NEED_RESCHED will result in an IPI and thus an
idle loop exit. This call can be safely removed.

Cc: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 drivers/cpuidle/cpuidle-haltpoll.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-haltpoll.c b/drivers/cpuidle/cpuidle-haltpoll.c
index e66df22f9695..b641bc535102 100644
--- a/drivers/cpuidle/cpuidle-haltpoll.c
+++ b/drivers/cpuidle/cpuidle-haltpoll.c
@@ -28,11 +28,8 @@ static enum cpuhp_state haltpoll_hp_state;
 static int default_enter_idle(struct cpuidle_device *dev,
 			      struct cpuidle_driver *drv, int index)
 {
-	if (current_clr_polling_and_test()) {
-		local_irq_enable();
-		return index;
-	}
 	arch_cpu_idle();
+
 	return index;
 }
 
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/7] cpuidle: s/CPUIDLE_FLAG_POLLING/CPUIDLE_FLAG_POLLING_SOFT
  2023-11-24 22:32 [PATCH 0/7] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2023-11-24 22:32 ` [PATCH 3/7] cpuidle: Remove unnecessary current_clr_polling_and_test() from haltpoll Frederic Weisbecker
@ 2023-11-24 22:32 ` Frederic Weisbecker
  2023-12-12 13:09   ` Rafael J. Wysocki
  2023-11-24 22:32 ` [PATCH 5/7] cpuidle: Introduce CPUIDLE_FLAG_POLLING_HARD Frederic Weisbecker
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2023-11-24 22:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Rafael J . Wysocki, Daniel Lezcano,
	linux-pm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Peter Zijlstra

In order to further distinguish software and hardware TIF_NEED_RESCHED
polling cpuidle states, rename CPUIDLE_FLAG_POLLING to
CPUIDLE_FLAG_POLLING_SOFT before introducing CPUIDLE_FLAG_POLLING_HARD
and tag mwait users with it.

This will allow cpuidle core to manage TIF_NR_POLLING on behalf of all
kinds of TIF_NEED_RESCHED polling states while keeping a necessary
distinction for the governors between software loops polling on
TIF_NEED_RESCHED and hardware monitored writes to thread flags.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 Documentation/driver-api/pm/cpuidle.rst | 2 +-
 drivers/cpuidle/cpuidle-powernv.c       | 2 +-
 drivers/cpuidle/cpuidle-pseries.c       | 4 ++--
 drivers/cpuidle/governors/ladder.c      | 4 ++--
 drivers/cpuidle/governors/menu.c        | 8 ++++----
 drivers/cpuidle/governors/teo.c         | 8 ++++----
 drivers/cpuidle/poll_state.c            | 2 +-
 include/linux/cpuidle.h                 | 2 +-
 8 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/Documentation/driver-api/pm/cpuidle.rst b/Documentation/driver-api/pm/cpuidle.rst
index d477208604b8..5ad10dad8033 100644
--- a/Documentation/driver-api/pm/cpuidle.rst
+++ b/Documentation/driver-api/pm/cpuidle.rst
@@ -192,7 +192,7 @@ governors for computations related to idle state selection:
 
 :c:member:`flags`
 	Flags representing idle state properties.  Currently, governors only use
-	the ``CPUIDLE_FLAG_POLLING`` flag which is set if the given object
+	the ``CPUIDLE_FLAG_POLLING_SOFT`` flag which is set if the given object
 	does not represent a real idle state, but an interface to a software
 	"loop" that can be used in order to avoid asking the processor to enter
 	any idle state at all.  [There are other flags used by the ``CPUIdle``
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 9ebedd972df0..675b8eb81ebd 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -158,7 +158,7 @@ static struct cpuidle_state powernv_states[CPUIDLE_STATE_MAX] = {
 		.exit_latency = 0,
 		.target_residency = 0,
 		.enter = snooze_loop,
-		.flags = CPUIDLE_FLAG_POLLING },
+		.flags = CPUIDLE_FLAG_POLLING_SOFT },
 };
 
 static int powernv_cpuidle_cpu_online(unsigned int cpu)
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 14db9b7d985d..4e08c9a39172 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -271,7 +271,7 @@ static struct cpuidle_state dedicated_states[NR_DEDICATED_STATES] = {
 		.exit_latency = 0,
 		.target_residency = 0,
 		.enter = &snooze_loop,
-		.flags = CPUIDLE_FLAG_POLLING },
+		.flags = CPUIDLE_FLAG_POLLING_SOFT },
 	{ /* CEDE */
 		.name = "CEDE",
 		.desc = "CEDE",
@@ -290,7 +290,7 @@ static struct cpuidle_state shared_states[] = {
 		.exit_latency = 0,
 		.target_residency = 0,
 		.enter = &snooze_loop,
-		.flags = CPUIDLE_FLAG_POLLING },
+		.flags = CPUIDLE_FLAG_POLLING_SOFT },
 	{ /* Shared Cede */
 		.name = "Shared Cede",
 		.desc = "Shared Cede",
diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index 8e9058c4ea63..a5f462b60d50 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -69,7 +69,7 @@ static int ladder_select_state(struct cpuidle_driver *drv,
 	struct ladder_device *ldev = this_cpu_ptr(&ladder_devices);
 	struct ladder_device_state *last_state;
 	int last_idx = dev->last_state_idx;
-	int first_idx = drv->states[0].flags & CPUIDLE_FLAG_POLLING ? 1 : 0;
+	int first_idx = drv->states[0].flags & CPUIDLE_FLAG_POLLING_SOFT ? 1 : 0;
 	s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
 	s64 last_residency;
 
@@ -133,7 +133,7 @@ static int ladder_enable_device(struct cpuidle_driver *drv,
 				struct cpuidle_device *dev)
 {
 	int i;
-	int first_idx = drv->states[0].flags & CPUIDLE_FLAG_POLLING ? 1 : 0;
+	int first_idx = drv->states[0].flags & CPUIDLE_FLAG_POLLING_SOFT ? 1 : 0;
 	struct ladder_device *ldev = &per_cpu(ladder_devices, dev->cpu);
 	struct ladder_device_state *lstate;
 	struct cpuidle_state *state;
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index b96e3da0fedd..98ec067805b6 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -320,7 +320,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 		 * it right away and keep the tick running if state[0] is a
 		 * polling one.
 		 */
-		*stop_tick = !(drv->states[0].flags & CPUIDLE_FLAG_POLLING);
+		*stop_tick = !(drv->states[0].flags & CPUIDLE_FLAG_POLLING_SOFT);
 		return 0;
 	}
 
@@ -365,7 +365,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 			 * Use a physical idle state, not busy polling, unless
 			 * a timer is going to trigger soon enough.
 			 */
-			if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
+			if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING_SOFT) &&
 			    s->exit_latency_ns <= latency_req &&
 			    s->target_residency_ns <= data->next_timer_ns) {
 				predicted_ns = s->target_residency_ns;
@@ -411,7 +411,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	 * Don't stop the tick if the selected state is a polling one or if the
 	 * expected idle duration is shorter than the tick period length.
 	 */
-	if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
+	if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING_SOFT) ||
 	     predicted_ns < TICK_NSEC) && !tick_nohz_tick_stopped()) {
 		*stop_tick = false;
 
@@ -492,7 +492,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 		 * duration predictor do a better job next time.
 		 */
 		measured_ns = 9 * MAX_INTERESTING / 10;
-	} else if ((drv->states[last_idx].flags & CPUIDLE_FLAG_POLLING) &&
+	} else if ((drv->states[last_idx].flags & CPUIDLE_FLAG_POLLING_SOFT) &&
 		   dev->poll_time_limit) {
 		/*
 		 * The CPU exited the "polling" state due to a time limit, so
diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index 7244f71c59c5..f86e16d0ffac 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -354,7 +354,7 @@ static int teo_find_shallower_state(struct cpuidle_driver *drv,
 
 	for (i = state_idx - 1; i >= 0; i--) {
 		if (dev->states_usage[i].disable ||
-				(no_poll && drv->states[i].flags & CPUIDLE_FLAG_POLLING))
+				(no_poll && drv->states[i].flags & CPUIDLE_FLAG_POLLING_SOFT))
 			continue;
 
 		state_idx = i;
@@ -426,7 +426,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 		 * all.  If state 1 is disabled, though, state 0 must be used
 		 * anyway.
 		 */
-		if ((!idx && !(drv->states[0].flags & CPUIDLE_FLAG_POLLING) &&
+		if ((!idx && !(drv->states[0].flags & CPUIDLE_FLAG_POLLING_SOFT) &&
 		    teo_state_ok(0, drv)) || dev->states_usage[1].disable) {
 			idx = 0;
 			goto out_tick;
@@ -584,7 +584,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	 * the current candidate state is low enough and skip the timers
 	 * check in that case too.
 	 */
-	if ((drv->states[0].flags & CPUIDLE_FLAG_POLLING) &&
+	if ((drv->states[0].flags & CPUIDLE_FLAG_POLLING_SOFT) &&
 	    drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS)
 		goto out_tick;
 
@@ -616,7 +616,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	 * one or the expected idle duration is shorter than the tick period
 	 * length.
 	 */
-	if ((!(drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
+	if ((!(drv->states[idx].flags & CPUIDLE_FLAG_POLLING_SOFT) &&
 	    duration_ns >= TICK_NSEC) || tick_nohz_tick_stopped())
 		return idx;
 
diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
index 9b6d90a72601..a2fe173de117 100644
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -57,6 +57,6 @@ void cpuidle_poll_state_init(struct cpuidle_driver *drv)
 	state->target_residency_ns = 0;
 	state->power_usage = -1;
 	state->enter = poll_idle;
-	state->flags = CPUIDLE_FLAG_POLLING;
+	state->flags = CPUIDLE_FLAG_POLLING_SOFT;
 }
 EXPORT_SYMBOL_GPL(cpuidle_poll_state_init);
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 3183aeb7f5b4..66b59868622c 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -78,7 +78,7 @@ struct cpuidle_state {
 
 /* Idle State Flags */
 #define CPUIDLE_FLAG_NONE       	(0x00)
-#define CPUIDLE_FLAG_POLLING		BIT(0) /* polling state */
+#define CPUIDLE_FLAG_POLLING_SOFT		BIT(0) /* polling state */
 #define CPUIDLE_FLAG_COUPLED		BIT(1) /* state applies to multiple cpus */
 #define CPUIDLE_FLAG_TIMER_STOP 	BIT(2) /* timer is stopped on this state */
 #define CPUIDLE_FLAG_UNUSABLE		BIT(3) /* avoid using this state */
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/7] cpuidle: Introduce CPUIDLE_FLAG_POLLING_HARD
  2023-11-24 22:32 [PATCH 0/7] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2023-11-24 22:32 ` [PATCH 4/7] cpuidle: s/CPUIDLE_FLAG_POLLING/CPUIDLE_FLAG_POLLING_SOFT Frederic Weisbecker
@ 2023-11-24 22:32 ` Frederic Weisbecker
  2023-12-12 13:12   ` Rafael J. Wysocki
  2023-11-24 22:32 ` [PATCH 6/7] cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_POLLING_HARD states Frederic Weisbecker
  2023-11-24 22:32 ` [PATCH 7/7] cpuidle: Handle TIF_NR_POLLING on behalf of software polling idle states Frederic Weisbecker
  6 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2023-11-24 22:32 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Rafael J . Wysocki, Daniel Lezcano, linux-pm,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Frederic Weisbecker

From: Peter Zijlstra <peterz@infradead.org>

Provide a way to tell the cpuidle core about states polling/monitoring
TIF_NEED_RESCHED on the hardware level, monitor/mwait users being the
only examples in use.

This will allow cpuidle core to manage TIF_NR_POLLING on behalf of all
kinds of TIF_NEED_RESCHED polling states while keeping a necessary
distinction for the governors between software loops polling on
TIF_NEED_RESCHED and hardware monitored writes to thread flags.

[fweisbec: _ Initialize flag from acpi_processor_setup_cstates() instead
             of acpi_processor_setup_lpi_states(), as the latter seem to
             be about arm64...
           _ Rename CPUIDLE_FLAG_NO_IPI to CPUIDLE_FLAG_POLLING_HARD]

Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 drivers/acpi/processor_idle.c | 3 +++
 drivers/idle/intel_idle.c     | 5 ++++-
 include/linux/cpuidle.h       | 3 ++-
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 3a34a8c425fe..a77a4d4b0dad 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -814,6 +814,9 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
 			if (cx->type != ACPI_STATE_C3)
 				drv->safe_state_index = count;
 		}
+
+		if (cx->entry_method == ACPI_CSTATE_FFH)
+			state->flags |= CPUIDLE_FLAG_POLLING_HARD;
 		/*
 		 * Halt-induced C1 is not good for ->enter_s2idle, because it
 		 * re-enables interrupts on exit.  Moreover, C1 is generally not
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 3e01a6b23e75..bc56624fe0b5 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1563,7 +1563,8 @@ static void __init intel_idle_init_cstates_acpi(struct cpuidle_driver *drv)
 		if (cx->type > ACPI_STATE_C1)
 			state->target_residency *= 3;
 
-		state->flags = MWAIT2flg(cx->address);
+		state->flags = MWAIT2flg(cx->address) | CPUIDLE_FLAG_POLLING_HARD;
+
 		if (cx->type > ACPI_STATE_C2)
 			state->flags |= CPUIDLE_FLAG_TLB_FLUSHED;
 
@@ -1836,6 +1837,8 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
 
 static void state_update_enter_method(struct cpuidle_state *state, int cstate)
 {
+	state->flags |= CPUIDLE_FLAG_POLLING_HARD;
+
 	if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
 		/*
 		 * Combining with XSTATE with IBRS or IRQ_ENABLE flags
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 66b59868622c..873fdf200dc3 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -78,13 +78,14 @@ struct cpuidle_state {
 
 /* Idle State Flags */
 #define CPUIDLE_FLAG_NONE       	(0x00)
-#define CPUIDLE_FLAG_POLLING_SOFT		BIT(0) /* polling state */
+#define CPUIDLE_FLAG_POLLING_SOFT	BIT(0) /* software need_resched() polling state */
 #define CPUIDLE_FLAG_COUPLED		BIT(1) /* state applies to multiple cpus */
 #define CPUIDLE_FLAG_TIMER_STOP 	BIT(2) /* timer is stopped on this state */
 #define CPUIDLE_FLAG_UNUSABLE		BIT(3) /* avoid using this state */
 #define CPUIDLE_FLAG_OFF		BIT(4) /* disable this state by default */
 #define CPUIDLE_FLAG_TLB_FLUSHED	BIT(5) /* idle-state flushes TLBs */
 #define CPUIDLE_FLAG_RCU_IDLE		BIT(6) /* idle-state takes care of RCU */
+#define CPUIDLE_FLAG_POLLING_HARD	BIT(7) /* hardware need_resched() polling state */
 
 struct cpuidle_device_kobj;
 struct cpuidle_state_kobj;
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/7] cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_POLLING_HARD states
  2023-11-24 22:32 [PATCH 0/7] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2023-11-24 22:32 ` [PATCH 5/7] cpuidle: Introduce CPUIDLE_FLAG_POLLING_HARD Frederic Weisbecker
@ 2023-11-24 22:32 ` Frederic Weisbecker
  2023-12-12 13:21   ` Rafael J. Wysocki
  2023-11-24 22:32 ` [PATCH 7/7] cpuidle: Handle TIF_NR_POLLING on behalf of software polling idle states Frederic Weisbecker
  6 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2023-11-24 22:32 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Rafael J . Wysocki, Daniel Lezcano, linux-pm,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Frederic Weisbecker

From: Peter Zijlstra <peterz@infradead.org>

The current handling of TIF_NR_POLLING is a bit of a maze:

1) A common brief part in the generic idle loop sets TIF_NR_POLLING
  while cpuidle selects an appropriate state and the tick is evaluated
  and then stopped. Summary: One pair of set/clear

2) The state cpuidle is then called with TIF_NR_POLLING cleared but if
  the state polls on need_resched() (software or hardware), it sets
  again TIF_NR_POLLING and clears it when it completes. Summary: another
  pair of set/clear

3) goto 1)

However those costly atomic operations, fully ordered RmW for some of
them, could be avoided if the cpuidle core knew in advance if the target
state polls on need_resched(). If so, TIF_NR_POLLING could simply be
set once before entering the idle loop and cleared once after idle loop
exit.

Start dealing with that with handling TIF_NR_POLLING on behalf of
CPUIDLE_FLAG_POLLING_HARD states.

[fweisbec: _ Handle broadcast properly
           _ Ignore mwait_idle() as it can be used by default_idle_call()]

Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 arch/x86/include/asm/mwait.h |  3 +--
 drivers/cpuidle/cpuidle.c    | 22 +++++++++++++++++++-
 include/linux/sched/idle.h   |  7 ++++++-
 kernel/sched/idle.c          | 40 +++++++++++++-----------------------
 4 files changed, 42 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 920426d691ce..3634d00e5c37 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -116,7 +116,7 @@ static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx)
  */
 static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
 {
-	if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
+	if (static_cpu_has_bug(X86_BUG_MONITOR) || !need_resched()) {
 		if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
 			mb();
 			clflush((void *)&current_thread_info()->flags);
@@ -134,7 +134,6 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo
 			}
 		}
 	}
-	current_clr_polling();
 }
 
 /*
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 737a026ef58a..49078cc83f4a 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -213,10 +213,10 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
 				 int index)
 {
 	int entered_state;
-
 	struct cpuidle_state *target_state = &drv->states[index];
 	bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
 	ktime_t time_start, time_end;
+	bool polling;
 
 	instrumentation_begin();
 
@@ -236,6 +236,23 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
 		broadcast = false;
 	}
 
+	polling = target_state->flags & CPUIDLE_FLAG_POLLING_HARD;
+
+	/*
+	 * If the target state doesn't poll on need_resched(), this is
+	 * the last check after which further TIF_NEED_RESCHED remote setting
+	 * will involve an IPI.
+	 */
+	if (!polling && current_clr_polling_and_test()) {
+		if (broadcast)
+			tick_broadcast_exit();
+		dev->last_residency_ns = 0;
+		local_irq_enable();
+		instrumentation_end();
+		return -EBUSY;
+	}
+
+
 	if (target_state->flags & CPUIDLE_FLAG_TLB_FLUSHED)
 		leave_mm(dev->cpu);
 
@@ -335,6 +352,9 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
 		dev->states_usage[index].rejected++;
 	}
 
+	if (!polling)
+		__current_set_polling();
+
 	instrumentation_end();
 
 	return entered_state;
diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h
index 478084f9105e..50c13531f5d8 100644
--- a/include/linux/sched/idle.h
+++ b/include/linux/sched/idle.h
@@ -68,6 +68,8 @@ static __always_inline bool __must_check current_set_polling_and_test(void)
 
 static __always_inline bool __must_check current_clr_polling_and_test(void)
 {
+	bool ret;
+
 	__current_clr_polling();
 
 	/*
@@ -76,7 +78,10 @@ static __always_inline bool __must_check current_clr_polling_and_test(void)
 	 */
 	smp_mb__after_atomic();
 
-	return unlikely(tif_need_resched());
+	ret = unlikely(tif_need_resched());
+	if (ret)
+		__current_set_polling();
+	return ret;
 }
 
 #else
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 565f8374ddbb..4e554b4e3781 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -94,11 +94,12 @@ void __cpuidle default_idle_call(void)
 		stop_critical_timings();
 
 		ct_cpuidle_enter();
-		arch_cpu_idle();
+		arch_cpu_idle(); // XXX assumes !polling
 		ct_cpuidle_exit();
 
 		start_critical_timings();
 		trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
+		__current_set_polling();
 	}
 	local_irq_enable();
 	instrumentation_end();
@@ -107,31 +108,14 @@ void __cpuidle default_idle_call(void)
 static int call_cpuidle_s2idle(struct cpuidle_driver *drv,
 			       struct cpuidle_device *dev)
 {
+	int ret;
+
 	if (current_clr_polling_and_test())
 		return -EBUSY;
 
-	return cpuidle_enter_s2idle(drv, dev);
-}
-
-static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
-		      int next_state)
-{
-	/*
-	 * The idle task must be scheduled, it is pointless to go to idle, just
-	 * update no idle residency and return.
-	 */
-	if (current_clr_polling_and_test()) {
-		dev->last_residency_ns = 0;
-		local_irq_enable();
-		return -EBUSY;
-	}
-
-	/*
-	 * Enter the idle state previously returned by the governor decision.
-	 * This function will block until an interrupt occurs and will take
-	 * care of re-enabling the local interrupts
-	 */
-	return cpuidle_enter(drv, dev, next_state);
+	ret = cpuidle_enter_s2idle(drv, dev);
+	__current_set_polling();
+	return ret;
 }
 
 /**
@@ -198,7 +182,7 @@ static void cpuidle_idle_call(void)
 		tick_nohz_idle_stop_tick();
 
 		next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
-		call_cpuidle(drv, dev, next_state);
+		cpuidle_enter(drv, dev, next_state);
 	} else {
 		bool stop_tick = true;
 
@@ -212,7 +196,12 @@ static void cpuidle_idle_call(void)
 		else
 			tick_nohz_idle_retain_tick();
 
-		entered_state = call_cpuidle(drv, dev, next_state);
+		/*
+		 * Enter the idle state previously returned by the governor decision.
+		 * This function will block until an interrupt occurs and will take
+		 * care of re-enabling the local interrupts.
+		 */
+		entered_state = cpuidle_enter(drv, dev, next_state);
 		/*
 		 * Give the governor an opportunity to reflect on the outcome
 		 */
@@ -220,7 +209,6 @@ static void cpuidle_idle_call(void)
 	}
 
 exit_idle:
-	__current_set_polling();
 
 	/*
 	 * It is up to the idle functions to reenable local interrupts
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/7] cpuidle: Handle TIF_NR_POLLING on behalf of software polling idle states
  2023-11-24 22:32 [PATCH 0/7] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2023-11-24 22:32 ` [PATCH 6/7] cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_POLLING_HARD states Frederic Weisbecker
@ 2023-11-24 22:32 ` Frederic Weisbecker
  2023-12-12 13:27   ` Rafael J. Wysocki
  6 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2023-11-24 22:32 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Rafael J . Wysocki, Daniel Lezcano,
	linux-pm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Peter Zijlstra

Software polling idle states set again TIF_NR_POLLING and clear it upon
exit. This involves error prone duplicated code and wasted cycles
performing atomic operations, sometimes RmW fully ordered.

To avoid this, benefit instead from the same generic TIF_NR_POLLING
handling that is currently in use for hardware polling states.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 drivers/cpuidle/cpuidle-powernv.c | 10 ----------
 drivers/cpuidle/cpuidle-pseries.c | 11 -----------
 drivers/cpuidle/cpuidle.c         |  4 ++--
 drivers/cpuidle/poll_state.c      | 30 ++++++++++++------------------
 4 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index 675b8eb81ebd..b88bbf7ead41 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -71,8 +71,6 @@ static int snooze_loop(struct cpuidle_device *dev,
 {
 	u64 snooze_exit_time;
 
-	set_thread_flag(TIF_POLLING_NRFLAG);
-
 	local_irq_enable();
 
 	snooze_exit_time = get_tb() + get_snooze_timeout(dev, drv, index);
@@ -81,21 +79,13 @@ static int snooze_loop(struct cpuidle_device *dev,
 	HMT_very_low();
 	while (!need_resched()) {
 		if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
-			/*
-			 * Task has not woken up but we are exiting the polling
-			 * loop anyway. Require a barrier after polling is
-			 * cleared to order subsequent test of need_resched().
-			 */
-			clear_thread_flag(TIF_POLLING_NRFLAG);
 			dev->poll_time_limit = true;
-			smp_mb();
 			break;
 		}
 	}
 
 	HMT_medium();
 	ppc64_runlatch_on();
-	clear_thread_flag(TIF_POLLING_NRFLAG);
 
 	local_irq_disable();
 
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 4e08c9a39172..0ae76512b740 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -39,8 +39,6 @@ int snooze_loop(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 {
 	u64 snooze_exit_time;
 
-	set_thread_flag(TIF_POLLING_NRFLAG);
-
 	pseries_idle_prolog();
 	raw_local_irq_enable();
 	snooze_exit_time = get_tb() + snooze_timeout;
@@ -50,21 +48,12 @@ int snooze_loop(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 		HMT_low();
 		HMT_very_low();
 		if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
-			/*
-			 * Task has not woken up but we are exiting the polling
-			 * loop anyway. Require a barrier after polling is
-			 * cleared to order subsequent test of need_resched().
-			 */
 			dev->poll_time_limit = true;
-			clear_thread_flag(TIF_POLLING_NRFLAG);
-			smp_mb();
 			break;
 		}
 	}
 
 	HMT_medium();
-	clear_thread_flag(TIF_POLLING_NRFLAG);
-
 	raw_local_irq_disable();
 
 	pseries_idle_epilog();
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 49078cc83f4a..9eb811b5d8b6 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -236,8 +236,8 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
 		broadcast = false;
 	}
 
-	polling = target_state->flags & CPUIDLE_FLAG_POLLING_HARD;
-
+	polling = (target_state->flags & (CPUIDLE_FLAG_POLLING_SOFT |
+					  CPUIDLE_FLAG_POLLING_HARD));
 	/*
 	 * If the target state doesn't poll on need_resched(), this is
 	 * the last check after which further TIF_NEED_RESCHED remote setting
diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
index a2fe173de117..3bfa251b344a 100644
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -13,35 +13,29 @@
 static int __cpuidle poll_idle(struct cpuidle_device *dev,
 			       struct cpuidle_driver *drv, int index)
 {
-	u64 time_start;
-
-	time_start = local_clock_noinstr();
+	u64 time_start = local_clock_noinstr();
+	unsigned int loop_count = 0;
+	u64 limit;
 
 	dev->poll_time_limit = false;
 
 	raw_local_irq_enable();
-	if (!current_set_polling_and_test()) {
-		unsigned int loop_count = 0;
-		u64 limit;
 
-		limit = cpuidle_poll_time(drv, dev);
+	limit = cpuidle_poll_time(drv, dev);
 
-		while (!need_resched()) {
-			cpu_relax();
-			if (loop_count++ < POLL_IDLE_RELAX_COUNT)
-				continue;
+	while (!need_resched()) {
+		cpu_relax();
+		if (loop_count++ < POLL_IDLE_RELAX_COUNT)
+			continue;
 
-			loop_count = 0;
-			if (local_clock_noinstr() - time_start > limit) {
-				dev->poll_time_limit = true;
-				break;
-			}
+		loop_count = 0;
+		if (local_clock_noinstr() - time_start > limit) {
+			dev->poll_time_limit = true;
+			break;
 		}
 	}
 	raw_local_irq_disable();
 
-	current_clr_polling();
-
 	return index;
 }
 
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/7] cpuidle: s/CPUIDLE_FLAG_POLLING/CPUIDLE_FLAG_POLLING_SOFT
  2023-11-24 22:32 ` [PATCH 4/7] cpuidle: s/CPUIDLE_FLAG_POLLING/CPUIDLE_FLAG_POLLING_SOFT Frederic Weisbecker
@ 2023-12-12 13:09   ` Rafael J. Wysocki
  2024-02-08 16:37     ` Frederic Weisbecker
  0 siblings, 1 reply; 15+ messages in thread
From: Rafael J. Wysocki @ 2023-12-12 13:09 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Rafael J . Wysocki, Daniel Lezcano, linux-pm,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Peter Zijlstra

On Fri, Nov 24, 2023 at 11:32 PM Frederic Weisbecker
<frederic@kernel.org> wrote:
>
> In order to further distinguish software and hardware TIF_NEED_RESCHED
> polling cpuidle states, rename CPUIDLE_FLAG_POLLING to
> CPUIDLE_FLAG_POLLING_SOFT before introducing CPUIDLE_FLAG_POLLING_HARD
> and tag mwait users with it.

Well, if MWAIT users are the only category that will be tagged with
the new flag, it can be called CPUIDLE_FLAG_POLLING_MWAIT or even
CPUIDLE_FLAG_MWAIT for that matter and the $subject patch won't be
necessary any more AFAICS.

> This will allow cpuidle core to manage TIF_NR_POLLING on behalf of all
> kinds of TIF_NEED_RESCHED polling states while keeping a necessary
> distinction for the governors between software loops polling on
> TIF_NEED_RESCHED and hardware monitored writes to thread flags.

Fair enough, but what about using a different name for the new flag
and leaving the old one as is?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/7] cpuidle: Introduce CPUIDLE_FLAG_POLLING_HARD
  2023-11-24 22:32 ` [PATCH 5/7] cpuidle: Introduce CPUIDLE_FLAG_POLLING_HARD Frederic Weisbecker
@ 2023-12-12 13:12   ` Rafael J. Wysocki
  2024-02-08 16:43     ` Frederic Weisbecker
  0 siblings, 1 reply; 15+ messages in thread
From: Rafael J. Wysocki @ 2023-12-12 13:12 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Peter Zijlstra, Rafael J . Wysocki, Daniel Lezcano,
	linux-pm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen

On Fri, Nov 24, 2023 at 11:32 PM Frederic Weisbecker
<frederic@kernel.org> wrote:
>
> From: Peter Zijlstra <peterz@infradead.org>
>
> Provide a way to tell the cpuidle core about states polling/monitoring
> TIF_NEED_RESCHED on the hardware level, monitor/mwait users being the
> only examples in use.
>
> This will allow cpuidle core to manage TIF_NR_POLLING on behalf of all
> kinds of TIF_NEED_RESCHED polling states while keeping a necessary
> distinction for the governors between software loops polling on
> TIF_NEED_RESCHED and hardware monitored writes to thread flags.
>
> [fweisbec: _ Initialize flag from acpi_processor_setup_cstates() instead
>              of acpi_processor_setup_lpi_states(), as the latter seem to
>              be about arm64...
>            _ Rename CPUIDLE_FLAG_NO_IPI to CPUIDLE_FLAG_POLLING_HARD]
>
> Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  drivers/acpi/processor_idle.c | 3 +++
>  drivers/idle/intel_idle.c     | 5 ++++-
>  include/linux/cpuidle.h       | 3 ++-
>  3 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 3a34a8c425fe..a77a4d4b0dad 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -814,6 +814,9 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
>                         if (cx->type != ACPI_STATE_C3)
>                                 drv->safe_state_index = count;
>                 }
> +
> +               if (cx->entry_method == ACPI_CSTATE_FFH)
> +                       state->flags |= CPUIDLE_FLAG_POLLING_HARD;
>                 /*
>                  * Halt-induced C1 is not good for ->enter_s2idle, because it
>                  * re-enables interrupts on exit.  Moreover, C1 is generally not
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 3e01a6b23e75..bc56624fe0b5 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -1563,7 +1563,8 @@ static void __init intel_idle_init_cstates_acpi(struct cpuidle_driver *drv)
>                 if (cx->type > ACPI_STATE_C1)
>                         state->target_residency *= 3;
>
> -               state->flags = MWAIT2flg(cx->address);
> +               state->flags = MWAIT2flg(cx->address) | CPUIDLE_FLAG_POLLING_HARD;
> +
>                 if (cx->type > ACPI_STATE_C2)
>                         state->flags |= CPUIDLE_FLAG_TLB_FLUSHED;
>
> @@ -1836,6 +1837,8 @@ static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
>
>  static void state_update_enter_method(struct cpuidle_state *state, int cstate)
>  {
> +       state->flags |= CPUIDLE_FLAG_POLLING_HARD;
> +
>         if (state->flags & CPUIDLE_FLAG_INIT_XSTATE) {
>                 /*
>                  * Combining with XSTATE with IBRS or IRQ_ENABLE flags
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index 66b59868622c..873fdf200dc3 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -78,13 +78,14 @@ struct cpuidle_state {
>
>  /* Idle State Flags */
>  #define CPUIDLE_FLAG_NONE              (0x00)
> -#define CPUIDLE_FLAG_POLLING_SOFT              BIT(0) /* polling state */
> +#define CPUIDLE_FLAG_POLLING_SOFT      BIT(0) /* software need_resched() polling state */
>  #define CPUIDLE_FLAG_COUPLED           BIT(1) /* state applies to multiple cpus */
>  #define CPUIDLE_FLAG_TIMER_STOP        BIT(2) /* timer is stopped on this state */
>  #define CPUIDLE_FLAG_UNUSABLE          BIT(3) /* avoid using this state */
>  #define CPUIDLE_FLAG_OFF               BIT(4) /* disable this state by default */
>  #define CPUIDLE_FLAG_TLB_FLUSHED       BIT(5) /* idle-state flushes TLBs */
>  #define CPUIDLE_FLAG_RCU_IDLE          BIT(6) /* idle-state takes care of RCU */
> +#define CPUIDLE_FLAG_POLLING_HARD      BIT(7) /* hardware need_resched() polling state */

Hardware need_resched() monitoring rather?  This doesn't do what
"polling" usually means AFAICS.

>
>  struct cpuidle_device_kobj;
>  struct cpuidle_state_kobj;
> --

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/7] cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_POLLING_HARD states
  2023-11-24 22:32 ` [PATCH 6/7] cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_POLLING_HARD states Frederic Weisbecker
@ 2023-12-12 13:21   ` Rafael J. Wysocki
  2024-02-08 17:03     ` Frederic Weisbecker
  0 siblings, 1 reply; 15+ messages in thread
From: Rafael J. Wysocki @ 2023-12-12 13:21 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Peter Zijlstra, Rafael J . Wysocki, Daniel Lezcano,
	linux-pm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen

On Fri, Nov 24, 2023 at 11:32 PM Frederic Weisbecker
<frederic@kernel.org> wrote:
>
> From: Peter Zijlstra <peterz@infradead.org>
>
> The current handling of TIF_NR_POLLING is a bit of a maze:
>
> 1) A common brief part in the generic idle loop sets TIF_NR_POLLING
>   while cpuidle selects an appropriate state and the tick is evaluated
>   and then stopped. Summary: One pair of set/clear
>
> 2) The state cpuidle is then called with TIF_NR_POLLING cleared but if
>   the state polls on need_resched() (software or hardware), it sets
>   again TIF_NR_POLLING and clears it when it completes. Summary: another
>   pair of set/clear
>
> 3) goto 1)
>
> However those costly atomic operations, fully ordered RmW for some of
> them, could be avoided if the cpuidle core knew in advance if the target
> state polls on need_resched(). If so, TIF_NR_POLLING could simply be
> set once before entering the idle loop and cleared once after idle loop
> exit.
>
> Start dealing with that with handling TIF_NR_POLLING on behalf of
> CPUIDLE_FLAG_POLLING_HARD states.
>
> [fweisbec: _ Handle broadcast properly
>            _ Ignore mwait_idle() as it can be used by default_idle_call()]
>
> Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  arch/x86/include/asm/mwait.h |  3 +--
>  drivers/cpuidle/cpuidle.c    | 22 +++++++++++++++++++-
>  include/linux/sched/idle.h   |  7 ++++++-
>  kernel/sched/idle.c          | 40 +++++++++++++-----------------------
>  4 files changed, 42 insertions(+), 30 deletions(-)
>
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index 920426d691ce..3634d00e5c37 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -116,7 +116,7 @@ static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx)
>   */
>  static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
>  {
> -       if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
> +       if (static_cpu_has_bug(X86_BUG_MONITOR) || !need_resched()) {
>                 if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
>                         mb();
>                         clflush((void *)&current_thread_info()->flags);
> @@ -134,7 +134,6 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo
>                         }
>                 }
>         }
> -       current_clr_polling();
>  }
>
>  /*
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 737a026ef58a..49078cc83f4a 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -213,10 +213,10 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
>                                  int index)
>  {
>         int entered_state;
> -
>         struct cpuidle_state *target_state = &drv->states[index];
>         bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
>         ktime_t time_start, time_end;
> +       bool polling;
>
>         instrumentation_begin();
>
> @@ -236,6 +236,23 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
>                 broadcast = false;
>         }
>
> +       polling = target_state->flags & CPUIDLE_FLAG_POLLING_HARD;
> +
> +       /*
> +        * If the target state doesn't poll on need_resched(), this is
> +        * the last check after which further TIF_NEED_RESCHED remote setting
> +        * will involve an IPI.
> +        */
> +       if (!polling && current_clr_polling_and_test()) {
> +               if (broadcast)
> +                       tick_broadcast_exit();
> +               dev->last_residency_ns = 0;
> +               local_irq_enable();
> +               instrumentation_end();
> +               return -EBUSY;
> +       }
> +
> +
>         if (target_state->flags & CPUIDLE_FLAG_TLB_FLUSHED)
>                 leave_mm(dev->cpu);
>
> @@ -335,6 +352,9 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
>                 dev->states_usage[index].rejected++;
>         }
>
> +       if (!polling)
> +               __current_set_polling();
> +
>         instrumentation_end();
>
>         return entered_state;
> diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h
> index 478084f9105e..50c13531f5d8 100644
> --- a/include/linux/sched/idle.h
> +++ b/include/linux/sched/idle.h
> @@ -68,6 +68,8 @@ static __always_inline bool __must_check current_set_polling_and_test(void)
>
>  static __always_inline bool __must_check current_clr_polling_and_test(void)
>  {
> +       bool ret;
> +
>         __current_clr_polling();
>
>         /*
> @@ -76,7 +78,10 @@ static __always_inline bool __must_check current_clr_polling_and_test(void)
>          */
>         smp_mb__after_atomic();
>
> -       return unlikely(tif_need_resched());
> +       ret = unlikely(tif_need_resched());
> +       if (ret)
> +               __current_set_polling();
> +       return ret;
>  }
>
>  #else
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 565f8374ddbb..4e554b4e3781 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -94,11 +94,12 @@ void __cpuidle default_idle_call(void)
>                 stop_critical_timings();
>
>                 ct_cpuidle_enter();
> -               arch_cpu_idle();
> +               arch_cpu_idle(); // XXX assumes !polling
>                 ct_cpuidle_exit();
>
>                 start_critical_timings();
>                 trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
> +               __current_set_polling();
>         }
>         local_irq_enable();
>         instrumentation_end();
> @@ -107,31 +108,14 @@ void __cpuidle default_idle_call(void)
>  static int call_cpuidle_s2idle(struct cpuidle_driver *drv,
>                                struct cpuidle_device *dev)
>  {
> +       int ret;
> +
>         if (current_clr_polling_and_test())
>                 return -EBUSY;
>
> -       return cpuidle_enter_s2idle(drv, dev);
> -}
> -
> -static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> -                     int next_state)

Since you are removing call_cpuidle(), you may as well remove
call_cpuidle_s2idle() which only has one caller anyway.

> -{
> -       /*
> -        * The idle task must be scheduled, it is pointless to go to idle, just
> -        * update no idle residency and return.
> -        */
> -       if (current_clr_polling_and_test()) {
> -               dev->last_residency_ns = 0;
> -               local_irq_enable();
> -               return -EBUSY;
> -       }
> -
> -       /*
> -        * Enter the idle state previously returned by the governor decision.
> -        * This function will block until an interrupt occurs and will take
> -        * care of re-enabling the local interrupts
> -        */
> -       return cpuidle_enter(drv, dev, next_state);
> +       ret = cpuidle_enter_s2idle(drv, dev);
> +       __current_set_polling();
> +       return ret;
>  }
>
>  /**
> @@ -198,7 +182,7 @@ static void cpuidle_idle_call(void)
>                 tick_nohz_idle_stop_tick();
>
>                 next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns);
> -               call_cpuidle(drv, dev, next_state);
> +               cpuidle_enter(drv, dev, next_state);
>         } else {
>                 bool stop_tick = true;
>
> @@ -212,7 +196,12 @@ static void cpuidle_idle_call(void)
>                 else
>                         tick_nohz_idle_retain_tick();
>
> -               entered_state = call_cpuidle(drv, dev, next_state);
> +               /*
> +                * Enter the idle state previously returned by the governor decision.
> +                * This function will block until an interrupt occurs and will take
> +                * care of re-enabling the local interrupts.
> +                */
> +               entered_state = cpuidle_enter(drv, dev, next_state);
>                 /*
>                  * Give the governor an opportunity to reflect on the outcome
>                  */
> @@ -220,7 +209,6 @@ static void cpuidle_idle_call(void)
>         }
>
>  exit_idle:
> -       __current_set_polling();
>
>         /*
>          * It is up to the idle functions to reenable local interrupts
> --
> 2.42.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 7/7] cpuidle: Handle TIF_NR_POLLING on behalf of software polling idle states
  2023-11-24 22:32 ` [PATCH 7/7] cpuidle: Handle TIF_NR_POLLING on behalf of software polling idle states Frederic Weisbecker
@ 2023-12-12 13:27   ` Rafael J. Wysocki
  0 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2023-12-12 13:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Rafael J . Wysocki, Daniel Lezcano, linux-pm,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Peter Zijlstra

On Fri, Nov 24, 2023 at 11:33 PM Frederic Weisbecker
<frederic@kernel.org> wrote:
>
> Software polling idle states set again TIF_NR_POLLING and clear it upon
> exit. This involves error prone duplicated code and wasted cycles
> performing atomic operations, sometimes RmW fully ordered.
>
> To avoid this, benefit instead from the same generic TIF_NR_POLLING
> handling that is currently in use for hardware polling states.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  drivers/cpuidle/cpuidle-powernv.c | 10 ----------
>  drivers/cpuidle/cpuidle-pseries.c | 11 -----------
>  drivers/cpuidle/cpuidle.c         |  4 ++--
>  drivers/cpuidle/poll_state.c      | 30 ++++++++++++------------------
>  4 files changed, 14 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index 675b8eb81ebd..b88bbf7ead41 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -71,8 +71,6 @@ static int snooze_loop(struct cpuidle_device *dev,
>  {
>         u64 snooze_exit_time;
>
> -       set_thread_flag(TIF_POLLING_NRFLAG);
> -
>         local_irq_enable();
>
>         snooze_exit_time = get_tb() + get_snooze_timeout(dev, drv, index);
> @@ -81,21 +79,13 @@ static int snooze_loop(struct cpuidle_device *dev,
>         HMT_very_low();
>         while (!need_resched()) {
>                 if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
> -                       /*
> -                        * Task has not woken up but we are exiting the polling
> -                        * loop anyway. Require a barrier after polling is
> -                        * cleared to order subsequent test of need_resched().
> -                        */
> -                       clear_thread_flag(TIF_POLLING_NRFLAG);
>                         dev->poll_time_limit = true;
> -                       smp_mb();
>                         break;
>                 }
>         }
>
>         HMT_medium();
>         ppc64_runlatch_on();
> -       clear_thread_flag(TIF_POLLING_NRFLAG);
>
>         local_irq_disable();
>
> diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
> index 4e08c9a39172..0ae76512b740 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -39,8 +39,6 @@ int snooze_loop(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>  {
>         u64 snooze_exit_time;
>
> -       set_thread_flag(TIF_POLLING_NRFLAG);
> -
>         pseries_idle_prolog();
>         raw_local_irq_enable();
>         snooze_exit_time = get_tb() + snooze_timeout;
> @@ -50,21 +48,12 @@ int snooze_loop(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>                 HMT_low();
>                 HMT_very_low();
>                 if (likely(snooze_timeout_en) && get_tb() > snooze_exit_time) {
> -                       /*
> -                        * Task has not woken up but we are exiting the polling
> -                        * loop anyway. Require a barrier after polling is
> -                        * cleared to order subsequent test of need_resched().
> -                        */
>                         dev->poll_time_limit = true;
> -                       clear_thread_flag(TIF_POLLING_NRFLAG);
> -                       smp_mb();
>                         break;
>                 }
>         }
>
>         HMT_medium();
> -       clear_thread_flag(TIF_POLLING_NRFLAG);
> -
>         raw_local_irq_disable();
>
>         pseries_idle_epilog();
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 49078cc83f4a..9eb811b5d8b6 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -236,8 +236,8 @@ noinstr int cpuidle_enter_state(struct cpuidle_device *dev,
>                 broadcast = false;
>         }
>
> -       polling = target_state->flags & CPUIDLE_FLAG_POLLING_HARD;
> -
> +       polling = (target_state->flags & (CPUIDLE_FLAG_POLLING_SOFT |
> +                                         CPUIDLE_FLAG_POLLING_HARD));

The outer parens are not needed on the right-hand side, or apply !! to it.

>         /*
>          * If the target state doesn't poll on need_resched(), this is
>          * the last check after which further TIF_NEED_RESCHED remote setting
> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
> index a2fe173de117..3bfa251b344a 100644
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -13,35 +13,29 @@
>  static int __cpuidle poll_idle(struct cpuidle_device *dev,
>                                struct cpuidle_driver *drv, int index)
>  {
> -       u64 time_start;
> -
> -       time_start = local_clock_noinstr();
> +       u64 time_start = local_clock_noinstr();
> +       unsigned int loop_count = 0;
> +       u64 limit;
>
>         dev->poll_time_limit = false;
>
>         raw_local_irq_enable();
> -       if (!current_set_polling_and_test()) {
> -               unsigned int loop_count = 0;
> -               u64 limit;
>
> -               limit = cpuidle_poll_time(drv, dev);
> +       limit = cpuidle_poll_time(drv, dev);
>
> -               while (!need_resched()) {
> -                       cpu_relax();
> -                       if (loop_count++ < POLL_IDLE_RELAX_COUNT)
> -                               continue;
> +       while (!need_resched()) {
> +               cpu_relax();
> +               if (loop_count++ < POLL_IDLE_RELAX_COUNT)
> +                       continue;
>
> -                       loop_count = 0;
> -                       if (local_clock_noinstr() - time_start > limit) {
> -                               dev->poll_time_limit = true;
> -                               break;
> -                       }
> +               loop_count = 0;
> +               if (local_clock_noinstr() - time_start > limit) {
> +                       dev->poll_time_limit = true;
> +                       break;
>                 }
>         }
>         raw_local_irq_disable();
>
> -       current_clr_polling();
> -
>         return index;
>  }
>
> --

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/7] cpuidle: s/CPUIDLE_FLAG_POLLING/CPUIDLE_FLAG_POLLING_SOFT
  2023-12-12 13:09   ` Rafael J. Wysocki
@ 2024-02-08 16:37     ` Frederic Weisbecker
  0 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2024-02-08 16:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Daniel Lezcano, linux-pm, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Peter Zijlstra

Le Tue, Dec 12, 2023 at 02:09:38PM +0100, Rafael J. Wysocki a écrit :
> On Fri, Nov 24, 2023 at 11:32 PM Frederic Weisbecker
> <frederic@kernel.org> wrote:
> >
> > In order to further distinguish software and hardware TIF_NEED_RESCHED
> > polling cpuidle states, rename CPUIDLE_FLAG_POLLING to
> > CPUIDLE_FLAG_POLLING_SOFT before introducing CPUIDLE_FLAG_POLLING_HARD
> > and tag mwait users with it.
> 
> Well, if MWAIT users are the only category that will be tagged with
> the new flag, it can be called CPUIDLE_FLAG_POLLING_MWAIT or even
> CPUIDLE_FLAG_MWAIT for that matter and the $subject patch won't be
> necessary any more AFAICS.

Yep.

> 
> > This will allow cpuidle core to manage TIF_NR_POLLING on behalf of all
> > kinds of TIF_NEED_RESCHED polling states while keeping a necessary
> > distinction for the governors between software loops polling on
> > TIF_NEED_RESCHED and hardware monitored writes to thread flags.
> 
> Fair enough, but what about using a different name for the new flag
> and leaving the old one as is?

Sounds good. Will do.

Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/7] cpuidle: Introduce CPUIDLE_FLAG_POLLING_HARD
  2023-12-12 13:12   ` Rafael J. Wysocki
@ 2024-02-08 16:43     ` Frederic Weisbecker
  0 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2024-02-08 16:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Peter Zijlstra, Daniel Lezcano, linux-pm, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen

Le Tue, Dec 12, 2023 at 02:12:48PM +0100, Rafael J. Wysocki a écrit :
> > diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> > index 66b59868622c..873fdf200dc3 100644
> > --- a/include/linux/cpuidle.h
> > +++ b/include/linux/cpuidle.h
> > @@ -78,13 +78,14 @@ struct cpuidle_state {
> >
> >  /* Idle State Flags */
> >  #define CPUIDLE_FLAG_NONE              (0x00)
> > -#define CPUIDLE_FLAG_POLLING_SOFT              BIT(0) /* polling state */
> > +#define CPUIDLE_FLAG_POLLING_SOFT      BIT(0) /* software need_resched() polling state */
> >  #define CPUIDLE_FLAG_COUPLED           BIT(1) /* state applies to multiple cpus */
> >  #define CPUIDLE_FLAG_TIMER_STOP        BIT(2) /* timer is stopped on this state */
> >  #define CPUIDLE_FLAG_UNUSABLE          BIT(3) /* avoid using this state */
> >  #define CPUIDLE_FLAG_OFF               BIT(4) /* disable this state by default */
> >  #define CPUIDLE_FLAG_TLB_FLUSHED       BIT(5) /* idle-state flushes TLBs */
> >  #define CPUIDLE_FLAG_RCU_IDLE          BIT(6) /* idle-state takes care of RCU */
> > +#define CPUIDLE_FLAG_POLLING_HARD      BIT(7) /* hardware need_resched() polling state */
> 
> Hardware need_resched() monitoring rather?  This doesn't do what
> "polling" usually means AFAICS.

Fair enough!

> >
> >  struct cpuidle_device_kobj;
> >  struct cpuidle_state_kobj;
> > --

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 6/7] cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_POLLING_HARD states
  2023-12-12 13:21   ` Rafael J. Wysocki
@ 2024-02-08 17:03     ` Frederic Weisbecker
  0 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2024-02-08 17:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Peter Zijlstra, Daniel Lezcano, linux-pm, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen

Le Tue, Dec 12, 2023 at 02:21:52PM +0100, Rafael J. Wysocki a écrit :
> On Fri, Nov 24, 2023 at 11:32 PM Frederic Weisbecker
> > -}
> > -
> > -static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> > -                     int next_state)
> 
> Since you are removing call_cpuidle(), you may as well remove
> call_cpuidle_s2idle() which only has one caller anyway.

Doing so in a seperate patch. Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-02-08 17:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-24 22:32 [PATCH 0/7] cpuidle: Handle TIF_NR_POLLING on behalf of polling idle states Frederic Weisbecker
2023-11-24 22:32 ` [PATCH 1/7] x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram Frederic Weisbecker
2023-11-24 22:32 ` [PATCH 2/7] x86: Add a comment about the "magic" behind shadow sti before mwait Frederic Weisbecker
2023-11-24 22:32 ` [PATCH 3/7] cpuidle: Remove unnecessary current_clr_polling_and_test() from haltpoll Frederic Weisbecker
2023-11-24 22:32 ` [PATCH 4/7] cpuidle: s/CPUIDLE_FLAG_POLLING/CPUIDLE_FLAG_POLLING_SOFT Frederic Weisbecker
2023-12-12 13:09   ` Rafael J. Wysocki
2024-02-08 16:37     ` Frederic Weisbecker
2023-11-24 22:32 ` [PATCH 5/7] cpuidle: Introduce CPUIDLE_FLAG_POLLING_HARD Frederic Weisbecker
2023-12-12 13:12   ` Rafael J. Wysocki
2024-02-08 16:43     ` Frederic Weisbecker
2023-11-24 22:32 ` [PATCH 6/7] cpuidle: Handle TIF_NR_POLLING on behalf of CPUIDLE_FLAG_POLLING_HARD states Frederic Weisbecker
2023-12-12 13:21   ` Rafael J. Wysocki
2024-02-08 17:03     ` Frederic Weisbecker
2023-11-24 22:32 ` [PATCH 7/7] cpuidle: Handle TIF_NR_POLLING on behalf of software polling idle states Frederic Weisbecker
2023-12-12 13:27   ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.