All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ankur Arora <ankur.a.arora@oracle.com>
To: linux-kernel@vger.kernel.org
Cc: tglx@linutronix.de, peterz@infradead.org,
	torvalds@linux-foundation.org, paulmck@kernel.org,
	akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	willy@infradead.org, mgorman@suse.de, jpoimboe@kernel.org,
	mark.rutland@arm.com, jgross@suse.com, andrew.cooper3@citrix.com,
	bristot@kernel.org, mathieu.desnoyers@efficios.com,
	geert@linux-m68k.org, glaubitz@physik.fu-berlin.de,
	anton.ivanov@cambridgegreys.com, mattst88@gmail.com,
	krypton@ulrich-teichert.org, rostedt@goodmis.org,
	David.Laight@ACULAB.COM, richard@nod.at, mjguzik@gmail.com,
	jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com,
	boris.ostrovsky@oracle.com, konrad.wilk@oracle.com,
	Ankur Arora <ankur.a.arora@oracle.com>
Subject: [PATCH 23/30] sched/fair: handle tick expiry under lazy preemption
Date: Mon, 12 Feb 2024 21:55:47 -0800	[thread overview]
Message-ID: <20240213055554.1802415-24-ankur.a.arora@oracle.com> (raw)
In-Reply-To: <20240213055554.1802415-1-ankur.a.arora@oracle.com>

The default policy for lazy scheduling is to schedule in exit-to-user,
assuming that would happen within the remaining time quanta of the
task.

However, that runs into the 'hog' problem -- the target task might
be running in the kernel and might not relinquish CPU on its own.

Handle that by upgrading the ignored tif_resched(NR_lazy) bit to
tif_resched(NR_now) at the next tick.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/87jzshhexi.ffs@tglx/
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>

---
Note:
  Instead of special casing the tick, it might be simpler to always
  do the upgrade on the second resched_curr().

  The theoretical problem with doing that is that the current
  approach deterministically provides a well-defined extra unit of
  time. Going with a second resched_curr() might mean that the
  amount of extra time the task gets depends on the vagaries of
  the incoming resched_curr() (which is fine if it's mostly from
  the tick; not fine if we could get it due to other reasons.)

  Practically, both performed equally well in my tests.

  Thoughts?

 kernel/sched/core.c     | 8 ++++++++
 kernel/sched/deadline.c | 2 +-
 kernel/sched/fair.c     | 2 +-
 kernel/sched/rt.c       | 2 +-
 kernel/sched/sched.h    | 6 ++++++
 5 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index de963e8e2bee..5df59a4548dc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1038,6 +1038,10 @@ void wake_up_q(struct wake_q_head *head)
  * For PREEMPT_AUTO: schedule idle threads eagerly, allow everything
  * else, whether running in user or kernel context, to finish its time
  * quanta, and mark for rescheduling at the next exit to user.
+ *
+ * Note: to avoid the hog problem, where the user does not relinquish
+ * CPU even after its time quanta has expired, upgrade lazy to eager
+ * on the second tick.
  */
 static resched_t resched_opt_translate(struct task_struct *curr,
 				       enum resched_opt opt)
@@ -1051,6 +1055,10 @@ static resched_t resched_opt_translate(struct task_struct *curr,
 	if (is_idle_task(curr))
 		return NR_now;
 
+	if (opt == RESCHED_TICK &&
+	    unlikely(test_tsk_need_resched(curr, NR_lazy)))
+		return NR_now;
+
 	return NR_lazy;
 }
 
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b4e68cfc3c3a..b935e634fbf8 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1379,7 +1379,7 @@ static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se, s64
 		}
 
 		if (!is_leftmost(dl_se, &rq->dl))
-			resched_curr(rq);
+			resched_curr_tick(rq);
 	}
 
 	/*
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 278eebe6656a..92910b721adb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12621,7 +12621,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
 	}
 
 	if (resched)
-		resched_curr(rq);
+		resched_curr_tick(rq);
 
 	if (static_branch_unlikely(&sched_numa_balancing))
 		task_tick_numa(rq, curr);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index c57cc8427a57..1a2f3524d0eb 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1023,7 +1023,7 @@ static void update_curr_rt(struct rq *rq)
 			rt_rq->rt_time += delta_exec;
 			exceeded = sched_rt_runtime_exceeded(rt_rq);
 			if (exceeded)
-				resched_curr(rq);
+				resched_curr_tick(rq);
 			raw_spin_unlock(&rt_rq->rt_runtime_lock);
 			if (exceeded)
 				do_start_rt_bandwidth(sched_rt_bandwidth(rt_rq));
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3600a8673d08..c7e7acab1022 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2465,6 +2465,7 @@ extern void reweight_task(struct task_struct *p, int prio);
 enum resched_opt {
 	RESCHED_DEFAULT,
 	RESCHED_FORCE,
+	RESCHED_TICK,
 };
 
 extern void __resched_curr(struct rq *rq, enum resched_opt opt);
@@ -2474,6 +2475,11 @@ static inline void resched_curr(struct rq *rq)
 	__resched_curr(rq, RESCHED_DEFAULT);
 }
 
+static inline void resched_curr_tick(struct rq *rq)
+{
+	__resched_curr(rq, RESCHED_TICK);
+}
+
 extern void resched_cpu(int cpu);
 
 extern struct rt_bandwidth def_rt_bandwidth;
-- 
2.31.1


  parent reply	other threads:[~2024-02-13  5:57 UTC|newest]

Thread overview: 155+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13  5:55 [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Ankur Arora
2024-02-13  5:55 ` [PATCH 01/30] preempt: introduce CONFIG_PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 02/30] thread_info: selector for TIF_NEED_RESCHED[_LAZY] Ankur Arora
2024-02-19 15:16   ` Thomas Gleixner
2024-02-20 22:50     ` Ankur Arora
2024-02-21 17:05       ` Thomas Gleixner
2024-02-21 18:26   ` Steven Rostedt
2024-02-21 20:03     ` Thomas Gleixner
2024-02-13  5:55 ` [PATCH 03/30] thread_info: tif_need_resched() now takes resched_t as param Ankur Arora
2024-02-14  3:17   ` kernel test robot
2024-02-14 14:08   ` Mark Rutland
2024-02-15  4:08     ` Ankur Arora
2024-02-19 12:30       ` Mark Rutland
2024-02-20 22:09         ` Ankur Arora
2024-02-19 15:21     ` Thomas Gleixner
2024-02-20 22:21       ` Ankur Arora
2024-02-21 17:07         ` Thomas Gleixner
2024-02-21 21:22           ` Ankur Arora
2024-02-13  5:55 ` [PATCH 04/30] sched: make test_*_tsk_thread_flag() return bool Ankur Arora
2024-02-14 14:12   ` Mark Rutland
2024-02-15  2:04     ` Ankur Arora
2024-02-13  5:55 ` [PATCH 05/30] sched: *_tsk_need_resched() now takes resched_t as param Ankur Arora
2024-02-19 15:26   ` Thomas Gleixner
2024-02-20 22:37     ` Ankur Arora
2024-02-21 17:10       ` Thomas Gleixner
2024-02-13  5:55 ` [PATCH 06/30] entry: handle lazy rescheduling at user-exit Ankur Arora
2024-02-19 15:29   ` Thomas Gleixner
2024-02-20 22:38     ` Ankur Arora
2024-02-13  5:55 ` [PATCH 07/30] entry/kvm: handle lazy rescheduling at guest-entry Ankur Arora
2024-02-13  5:55 ` [PATCH 08/30] entry: irqentry_exit only preempts for TIF_NEED_RESCHED Ankur Arora
2024-02-13  5:55 ` [PATCH 09/30] sched: __schedule_loop() doesn't need to check for need_resched_lazy() Ankur Arora
2024-02-13  5:55 ` [PATCH 10/30] sched: separate PREEMPT_DYNAMIC config logic Ankur Arora
2024-02-13  5:55 ` [PATCH 11/30] sched: runtime preemption config under PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 12/30] rcu: limit PREEMPT_RCU to full preemption " Ankur Arora
2024-02-13  5:55 ` [PATCH 13/30] rcu: fix header guard for rcu_all_qs() Ankur Arora
2024-02-13  5:55 ` [PATCH 14/30] preempt,rcu: warn on PREEMPT_RCU=n, preempt=full Ankur Arora
2024-02-13  5:55 ` [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y Ankur Arora
2024-03-10 10:03   ` Joel Fernandes
2024-03-10 18:56     ` Paul E. McKenney
2024-03-11  0:48       ` Joel Fernandes
2024-03-11  3:56         ` Paul E. McKenney
2024-03-11 15:01           ` Joel Fernandes
2024-03-11 20:51             ` Ankur Arora
2024-03-11 22:12               ` Thomas Gleixner
2024-03-11  5:18         ` Ankur Arora
2024-03-11 15:25           ` Joel Fernandes
2024-03-11 19:12             ` Thomas Gleixner
2024-03-11 19:53               ` Paul E. McKenney
2024-03-11 20:29                 ` Thomas Gleixner
2024-03-12  0:01                   ` Paul E. McKenney
2024-03-12  0:08               ` Joel Fernandes
2024-03-12  3:16                 ` Ankur Arora
2024-03-12  3:24                   ` Joel Fernandes
2024-03-12  5:23                     ` Ankur Arora
2024-02-13  5:55 ` [PATCH 16/30] rcu: force context-switch " Ankur Arora
2024-02-13  5:55 ` [PATCH 17/30] x86/thread_info: define TIF_NEED_RESCHED_LAZY Ankur Arora
2024-02-14 13:25   ` Mark Rutland
2024-02-14 20:31     ` Ankur Arora
2024-02-19 12:32       ` Mark Rutland
2024-02-13  5:55 ` [PATCH 18/30] sched: prepare for lazy rescheduling in resched_curr() Ankur Arora
2024-02-13  5:55 ` [PATCH 19/30] sched: default preemption policy for PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 20/30] sched: handle idle preemption " Ankur Arora
2024-02-13  5:55 ` [PATCH 21/30] sched: schedule eagerly in resched_cpu() Ankur Arora
2024-02-13  5:55 ` [PATCH 22/30] sched/fair: refactor update_curr(), entity_tick() Ankur Arora
2024-02-13  5:55 ` Ankur Arora [this message]
2024-02-21 21:38   ` [PATCH 23/30] sched/fair: handle tick expiry under lazy preemption Steven Rostedt
2024-02-28 13:47   ` Juri Lelli
2024-02-29  6:43     ` Ankur Arora
2024-02-29  9:33       ` Juri Lelli
2024-02-29 23:54         ` Ankur Arora
2024-03-01  0:28           ` Paul E. McKenney
2024-02-13  5:55 ` [PATCH 24/30] sched: support preempt=none under PREEMPT_AUTO Ankur Arora
2024-02-13  5:55 ` [PATCH 25/30] sched: support preempt=full " Ankur Arora
2024-02-13  5:55 ` [PATCH 26/30] sched: handle preempt=voluntary " Ankur Arora
2024-03-03  1:08   ` Joel Fernandes
2024-03-05  8:11     ` Ankur Arora
2024-03-06 20:42       ` Joel Fernandes
2024-03-07 19:01         ` Paul E. McKenney
2024-03-08  0:15           ` Joel Fernandes
2024-03-08  0:42             ` Paul E. McKenney
2024-03-08  4:22               ` Ankur Arora
2024-03-08 21:33                 ` Paul E. McKenney
2024-03-11  4:50                   ` Ankur Arora
2024-03-11 19:26                     ` Paul E. McKenney
2024-03-11 20:09                       ` Ankur Arora
2024-03-11 20:23                         ` Linus Torvalds
2024-03-11 21:03                           ` Ankur Arora
2024-03-12  0:03                           ` Paul E. McKenney
2024-03-12 12:14                             ` Thomas Gleixner
2024-03-12 19:40                               ` Paul E. McKenney
2024-03-08  3:49             ` Ankur Arora
2024-03-08  5:29               ` Joel Fernandes
2024-03-08  6:54               ` Juri Lelli
2024-03-11  5:34                 ` Ankur Arora
2024-02-13  5:55 ` [PATCH 27/30] sched: latency warn for TIF_NEED_RESCHED_LAZY Ankur Arora
2024-02-13  5:55 ` [PATCH 28/30] tracing: support lazy resched Ankur Arora
2024-02-13  5:55 ` [PATCH 29/30] Documentation: tracing: add TIF_NEED_RESCHED_LAZY Ankur Arora
2024-02-21 21:43   ` Steven Rostedt
2024-02-21 23:22     ` Ankur Arora
2024-02-21 23:53       ` Steven Rostedt
2024-03-01 23:33     ` Joel Fernandes
2024-03-02  3:09       ` Ankur Arora
2024-03-03 19:32         ` Joel Fernandes
2024-02-13  5:55 ` [PATCH 30/30] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y Ankur Arora
2024-02-13  9:47 ` [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Geert Uytterhoeven
2024-02-13 21:46   ` Ankur Arora
2024-02-14 23:57 ` Paul E. McKenney
2024-02-15  2:03   ` Ankur Arora
2024-02-15  3:45     ` Paul E. McKenney
2024-02-15 19:28       ` Paul E. McKenney
2024-02-15 20:04         ` Thomas Gleixner
2024-02-15 20:54           ` Paul E. McKenney
2024-02-15 20:53         ` Ankur Arora
2024-02-15 20:55           ` Paul E. McKenney
2024-02-15 21:24         ` Ankur Arora
2024-02-15 22:54           ` Paul E. McKenney
2024-02-15 22:56             ` Paul E. McKenney
2024-02-16  0:45             ` Ankur Arora
2024-02-16  2:59               ` Paul E. McKenney
2024-02-17  0:55                 ` Paul E. McKenney
2024-02-17  3:59                   ` Ankur Arora
2024-02-18 18:17                     ` Paul E. McKenney
2024-02-19 16:48                       ` Paul E. McKenney
2024-02-21 18:19                         ` Steven Rostedt
2024-02-21 19:41                           ` Paul E. McKenney
2024-02-21 20:11                             ` Steven Rostedt
2024-02-21 20:22                               ` Paul E. McKenney
2024-02-22 15:50                                 ` Mark Rutland
2024-02-22 19:11                                   ` Paul E. McKenney
2024-02-23 11:05                                     ` Mark Rutland
2024-02-23 15:31                                       ` Paul E. McKenney
2024-03-02  1:16                                         ` Paul E. McKenney
2024-03-19 11:45                                           ` Tasks RCU, ftrace, and trampolines (was: Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling) Mark Rutland
2024-03-19 23:33                                             ` Paul E. McKenney
2024-02-21  6:48                   ` [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling Ankur Arora
2024-02-21 17:44                     ` Paul E. McKenney
2024-02-16  0:45             ` Ankur Arora
2024-02-21 12:23 ` Raghavendra K T
2024-02-21 17:15   ` Thomas Gleixner
2024-02-21 17:27     ` Raghavendra K T
2024-02-21 21:16       ` Ankur Arora
2024-02-22  4:05         ` Raghavendra K T
2024-02-22 21:23       ` Thomas Gleixner
2024-02-23  3:14         ` Ankur Arora
2024-02-23  6:28           ` Raghavendra K T
2024-02-24  3:15             ` Raghavendra K T
2024-02-27 17:45               ` Ankur Arora
2024-02-22 13:04     ` Raghavendra K T
2024-04-23 15:21 ` Shrikanth Hegde
2024-04-23 16:13   ` Linus Torvalds
2024-04-26  7:46     ` Shrikanth Hegde
2024-04-26 19:00       ` Ankur Arora
2024-05-07 11:16         ` Shrikanth Hegde
2024-05-08  5:18           ` Ankur Arora
2024-05-15 14:31             ` Shrikanth Hegde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240213055554.1802415-24-ankur.a.arora@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=anton.ivanov@cambridgegreys.com \
    --cc=bharata@amd.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=bristot@kernel.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=geert@linux-m68k.org \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=jon.grimm@amd.com \
    --cc=jpoimboe@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=krypton@ulrich-teichert.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mattst88@gmail.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=richard@nod.at \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.