All of lore.kernel.org
 help / color / mirror / Atom feed
From: Douglas Anderson <dianders@chromium.org>
To: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Steven Rostedt <rostedt@goodmis.org>
Cc: Joel Fernandes <joelaf@google.com>,
	Douglas Anderson <dianders@chromium.org>,
	Ben Segall <bsegall@google.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Mel Gorman <mgorman@suse.de>,
	linux-kernel@vger.kernel.org
Subject: [PATCH] sched/rt: Don't reschedule a throttled task even if it's higher priority
Date: Mon, 15 Nov 2021 17:02:45 -0800	[thread overview]
Message-ID: <20211115170241.1.I94825a614577505bd1a8be9aeff208a49cb39b3d@changeid> (raw)

While testing RT_GROUP_SCHED, I found that my system would go bonkers
if my test RT tasks ever got throttled (even if my test RT tasks were
set to only get a tiny slice of CPU time). Specifically I found that
whenever my test RT tasks were throttled that all other RT tasks in
the system were being starved (!!). Several important RT tasks in the
kernel were suddenly getting almost no timeslices and my system became
unusable.

After some experimentation, I determined that this behavior only
happened when I gave my test RT tasks a high priority. If I gave my
test RT tasks a low priority then they were throttled as expected and
nothing was starved.

I managed to come up with a test case that hopefully anyone can run to
demonstrate the problem. The test case uses shell commands and python
but certainly you could reproduce in other ways:

echo "Allow 20 ms more of RT at system and top cgroup"
old_rt=$(cat /proc/sys/kernel/sched_rt_runtime_us)
echo $((old_rt + 20000)) > /proc/sys/kernel/sched_rt_runtime_us
old_rt=$(cat /sys/fs/cgroup/cpu/cpu.rt_runtime_us)
echo $((old_rt + 20000)) > /sys/fs/cgroup/cpu/cpu.rt_runtime_us

echo "Give 10 ms each to spinny and printy groups"
mkdir /sys/fs/cgroup/cpu/spinny
echo 10000 > /sys/fs/cgroup/cpu/spinny/cpu.rt_runtime_us
mkdir /sys/fs/cgroup/cpu/printy
echo 10000 > /sys/fs/cgroup/cpu/printy/cpu.rt_runtime_us

echo "Fork off a printy thing to be a nice RT citizen"
echo "Prints once per second. Priority only 1."
python -c "import time;
last_time = time.time()
while True:
  time.sleep(1)
  now_time = time.time()
  print('Time fies %f' % (now_time - last_time))
  last_time = now_time" &
pid=$!
echo "Give python a few seconds to get started"
sleep 3
echo $pid >> /sys/fs/cgroup/cpu/printy/tasks
chrt -p -f 1 $pid

echo "Sleep to observe that everything is peachy"
sleep 3

echo "Fork off a bunch of evil spinny things"
echo "Chews CPU time. Priority 99."
for i in $(seq 13); do
  python -c "while True: pass"&
  pid=$!
  echo $pid >> /sys/fs/cgroup/cpu/spinny/tasks
  chrt -p -f 99 $pid
done

echo "Huh? Almost no more prints?"

I believe that the problem is an "if" test that's been in
push_rt_task() forever where we will just reschedule the current task
if it's higher priority than the next one. If I just remove that
special case then everything works for me. I tried making it
conditional on just `!rq->rt.rt_throttled` but for whatever reason
that wasn't enough. The `if` test looks like an unlikely special case
optimization and it seems like things ought to be fine without it.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
I know less than zero about the scheduler (so if I told you something,
it's better than 50% chance the the opposite is true!). Here I'm
asserting that we totally don't need this special case and the system
will be fine without it, but I actually don't have any data to back
that up. If nothing else, hopefully my test case in the commit message
would let someone else reproduce and see what I'm talking about and
can come up with a better fix.

 kernel/sched/rt.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index b48baaba2fc2..93ea5de0f917 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2048,16 +2048,6 @@ static int push_rt_task(struct rq *rq, bool pull)
 	if (WARN_ON(next_task == rq->curr))
 		return 0;
 
-	/*
-	 * It's possible that the next_task slipped in of
-	 * higher priority than current. If that's the case
-	 * just reschedule current.
-	 */
-	if (unlikely(next_task->prio < rq->curr->prio)) {
-		resched_curr(rq);
-		return 0;
-	}
-
 	/* We might release rq lock */
 	get_task_struct(next_task);
 
-- 
2.34.0.rc1.387.gb447b232ab-goog


             reply	other threads:[~2021-11-16  4:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-16  1:02 Douglas Anderson [this message]
2021-11-30 16:30 ` [PATCH] sched/rt: Don't reschedule a throttled task even if it's higher priority Doug Anderson
2021-11-30 23:36   ` Joel Fernandes
2021-12-02 11:11     ` Qais Yousef
2021-12-02 18:05       ` Doug Anderson
2021-12-13 13:08         ` Qais Yousef
2021-12-13 18:32           ` Doug Anderson
     [not found] ` <20211201113052.2025-1-hdanton@sina.com>
2021-12-02  0:50   ` Doug Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211115170241.1.I94825a614577505bd1a8be9aeff208a49cb39b3d@changeid \
    --to=dianders@chromium.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.