All of lore.kernel.org
 help / color / mirror / Atom feed
From: Doug Anderson <dianders@chromium.org>
To: Hillf Danton <hdanton@sina.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Joel Fernandes <joelaf@google.com>,
	Ben Segall <bsegall@google.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Mel Gorman <mgorman@suse.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] sched/rt: Don't reschedule a throttled task even if it's higher priority
Date: Wed, 1 Dec 2021 16:50:44 -0800	[thread overview]
Message-ID: <CAD=FV=VL_xrQu3Bvb9GFcfSaOpTF_x5dWPhZe60SWC3TinaLqA@mail.gmail.com> (raw)
In-Reply-To: <20211201113052.2025-1-hdanton@sina.com>

Hi,

On Wed, Dec 1, 2021 at 3:31 AM Hillf Danton <hdanton@sina.com> wrote:
>
> On Mon, 15 Nov 2021 17:02:45 -0800 Douglas Anderson wrote:
> > While testing RT_GROUP_SCHED, I found that my system would go bonkers
> > if my test RT tasks ever got throttled (even if my test RT tasks were
> > set to only get a tiny slice of CPU time). Specifically I found that
> > whenever my test RT tasks were throttled that all other RT tasks in
> > the system were being starved (!!). Several important RT tasks in the
> > kernel were suddenly getting almost no timeslices and my system became
> > unusable.
> >
> > After some experimentation, I determined that this behavior only
> > happened when I gave my test RT tasks a high priority. If I gave my
> > test RT tasks a low priority then they were throttled as expected and
> > nothing was starved.
> >
> > I managed to come up with a test case that hopefully anyone can run to
> > demonstrate the problem. The test case uses shell commands and python
> > but certainly you could reproduce in other ways:
> >
> > echo "Allow 20 ms more of RT at system and top cgroup"
> > old_rt=$(cat /proc/sys/kernel/sched_rt_runtime_us)
> > echo $((old_rt + 20000)) > /proc/sys/kernel/sched_rt_runtime_us
> > old_rt=$(cat /sys/fs/cgroup/cpu/cpu.rt_runtime_us)
> > echo $((old_rt + 20000)) > /sys/fs/cgroup/cpu/cpu.rt_runtime_us
> >
> > echo "Give 10 ms each to spinny and printy groups"
> > mkdir /sys/fs/cgroup/cpu/spinny
> > echo 10000 > /sys/fs/cgroup/cpu/spinny/cpu.rt_runtime_us
> > mkdir /sys/fs/cgroup/cpu/printy
> > echo 10000 > /sys/fs/cgroup/cpu/printy/cpu.rt_runtime_us
> >
> > echo "Fork off a printy thing to be a nice RT citizen"
> > echo "Prints once per second. Priority only 1."
> > python -c "import time;
> > last_time = time.time()
> > while True:
> >   time.sleep(1)
> >   now_time = time.time()
> >   print('Time fies %f' % (now_time - last_time))
> >   last_time = now_time" &
> > pid=$!
> > echo "Give python a few seconds to get started"
> > sleep 3
> > echo $pid >> /sys/fs/cgroup/cpu/printy/tasks
> > chrt -p -f 1 $pid
> >
> > echo "Sleep to observe that everything is peachy"
> > sleep 3
> >
> > echo "Fork off a bunch of evil spinny things"
> > echo "Chews CPU time. Priority 99."
> > for i in $(seq 13); do
> >   python -c "while True: pass"&
> >   pid=$!
> >   echo $pid >> /sys/fs/cgroup/cpu/spinny/tasks
> >   chrt -p -f 99 $pid
> > done
> >
> > echo "Huh? Almost no more prints?"
> >
> > I believe that the problem is an "if" test that's been in
> > push_rt_task() forever where we will just reschedule the current task
> > if it's higher priority than the next one. If I just remove that
> > special case then everything works for me. I tried making it
> > conditional on just `!rq->rt.rt_throttled` but for whatever reason
> > that wasn't enough. The `if` test looks like an unlikely special case
> > optimization and it seems like things ought to be fine without it.
> >
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> > I know less than zero about the scheduler (so if I told you something,
> > it's better than 50% chance the the opposite is true!). Here I'm
> > asserting that we totally don't need this special case and the system
> > will be fine without it, but I actually don't have any data to back
> > that up. If nothing else, hopefully my test case in the commit message
> > would let someone else reproduce and see what I'm talking about and
> > can come up with a better fix.
>
> Can you try to tune the knob down to somewhere like 1ms?
>
> Hillf
>
> /*
>  * period over which we measure -rt task CPU usage in us.
>  * default: 1s
>  */
> unsigned int sysctl_sched_rt_period = 1000000;

I could give it a shot, but that's a pretty big behavior change and
the Documentation (sched-rt-group.rst) warns me away from such a
thing. The default of 1 second seems crazy conservative, but tweaking
it all the way down to 1 ms seems a bit aggressive. It also feels like
this would only be working around the problem, not necessarily solving
it at the core?

-Doug

      parent reply	other threads:[~2021-12-02  0:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-16  1:02 [PATCH] sched/rt: Don't reschedule a throttled task even if it's higher priority Douglas Anderson
2021-11-30 16:30 ` Doug Anderson
2021-11-30 23:36   ` Joel Fernandes
2021-12-02 11:11     ` Qais Yousef
2021-12-02 18:05       ` Doug Anderson
2021-12-13 13:08         ` Qais Yousef
2021-12-13 18:32           ` Doug Anderson
     [not found] ` <20211201113052.2025-1-hdanton@sina.com>
2021-12-02  0:50   ` Doug Anderson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD=FV=VL_xrQu3Bvb9GFcfSaOpTF_x5dWPhZe60SWC3TinaLqA@mail.gmail.com' \
    --to=dianders@chromium.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hdanton@sina.com \
    --cc=joelaf@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.