linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RT_GROUP_SCHED throttling blocks unthrottled RT tasks?
@ 2021-11-05 17:44 Doug Anderson
  2021-11-05 23:04 ` Joel Fernandes
  2021-11-16  1:13 ` Doug Anderson
  0 siblings, 2 replies; 5+ messages in thread
From: Doug Anderson @ 2021-11-05 17:44 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt; +Cc: Joel Fernandes, LKML

Hi,

I'm seeing a strange behavior that I _think_ is a bug. I'm hoping that
some of the scheduling experts can tell me if I'm just
misunderstanding or if this is truly a bug. To see it, I do this:

--

# Allow 1000 us more of RT at system and top cgroup
old_rt=$(cat /proc/sys/kernel/sched_rt_runtime_us)
echo $((old_rt + 1000)) > /proc/sys/kernel/sched_rt_runtime_us
old_rt=$(cat /sys/fs/cgroup/cpu/cpu.rt_runtime_us)
echo $((old_rt + 1000)) > /sys/fs/cgroup/cpu/cpu.rt_runtime_us

# Give the 1000 us to my own group
mkdir /sys/fs/cgroup/cpu/doug
echo 1000 > /sys/fs/cgroup/cpu/doug/cpu.rt_runtime_us

# Fork off a bunch of spinny things
for i in $(seq 13); do
  python -c "while True: pass"&
done

# Make my spinny things RT and put in my group
# (assumes no other python is running!)
for pid in $(ps aux | grep python | grep -v grep | awk '{print $2}'); do
  echo $pid >> /sys/fs/cgroup/cpu/doug/tasks
  chrt -p -f 99 $pid
done

--

As expected, the spinny python tasks are pretty much throttled down to
0 in the above (they get 1 ms out of 1 second).

However, _the bug_ is that the above basically causes all _other_ RT
things in my system to stop functioning. I'm on an ARM Chromebook
(sc7180-trogdor) and we communicate to our EC on a "realtime" thread
due to SPI timing requirements. The above commands appear to starve
the EC's communication task and (as far as I can tell) other RT tasks
in the system.

Notably:

a) My EC comms slow to a crawl (eventually one gets through).

b) "top" shows stuff like this:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
COMMAND
  179 root     -51   0       0      0      0 R 100.0   0.0   0:31.79
cros_ec_spi_hig
  180 root     -51   0       0      0      0 R  95.2   0.0   0:50.19
irq/169-chromeo
  184 root     -51   0       0      0      0 R  95.2   0.0   0:13.24
spi10
  221 root      -2   0       0      0      0 R  95.2   0.0   0:50.57
ring0

c) If I give my spinny tasks just a little bit more time than 1 ms
then I get a hung task.


When I'm testing the above, the non-RT stuff in the system continues
to work OK though. I can even go in and kill all my python tasks and
the system returns to normal.

I tried gathering some tracing. One bit that might (?) be relevant:

 cros_ec_spi_hig-179     [000] d.h5  1495.305919: sched_waking:
comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
 cros_ec_spi_hig-179     [000] d.h6  1495.305926: sched_wakeup:
comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
          <idle>-0       [001] d.H5  1495.309113: sched_waking:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
          <idle>-0       [001] d.H6  1495.309119: sched_wakeup:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
 cros_ec_spi_hig-179     [000] d.h5  1495.309336: sched_waking:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
 cros_ec_spi_hig-179     [000] d.h6  1495.309341: sched_wakeup:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
          <idle>-0       [001] d.H5  1495.312137: sched_waking:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
          <idle>-0       [001] d.H6  1495.312142: sched_wakeup:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
 cros_ec_spi_hig-179     [000] d.h5  1495.312859: sched_waking:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006
 cros_ec_spi_hig-179     [000] d.h6  1495.312870: sched_wakeup:
comm=sugov:6 pid=2658 prio=-1 target_cpu=006

My best guess is that there's some bug in the scheduler where it just
loops constantly picking an unthrottled RT task but then incorrectly
decides that it's throttled and thus doesn't run it.

Most of my testing has been on the chromeos-5.4 kernel, but just in
case I tried a vanilla v5.15 kernel and I could reproduce the same
problems.


Anyway, if I'm just doing something stupid then I appologize of the
noise. If the above should work and you need me to gather more logging
/ try any experiments, I'm happy to do so.

Thanks!

-Doug

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RT_GROUP_SCHED throttling blocks unthrottled RT tasks?
  2021-11-05 17:44 RT_GROUP_SCHED throttling blocks unthrottled RT tasks? Doug Anderson
@ 2021-11-05 23:04 ` Joel Fernandes
  2021-11-05 23:05   ` Joel Fernandes
  2021-11-16  1:13 ` Doug Anderson
  1 sibling, 1 reply; 5+ messages in thread
From: Joel Fernandes @ 2021-11-05 23:04 UTC (permalink / raw)
  To: Doug Anderson; +Cc: Peter Zijlstra, Steven Rostedt, LKML

On Fri, Nov 5, 2021 at 1:44 PM Doug Anderson <dianders@chromium.org> wrote:
[..]
>
>
> I tried gathering some tracing. One bit that might (?) be relevant:
>
>  cros_ec_spi_hig-179     [000] d.h5  1495.305919: sched_waking:
> comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
>  cros_ec_spi_hig-179     [000] d.h6  1495.305926: sched_wakeup:
> comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
>           <idle>-0       [001] d.H5  1495.309113: sched_waking:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>           <idle>-0       [001] d.H6  1495.309119: sched_wakeup:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>  cros_ec_spi_hig-179     [000] d.h5  1495.309336: sched_waking:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>  cros_ec_spi_hig-179     [000] d.h6  1495.309341: sched_wakeup:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>           <idle>-0       [001] d.H5  1495.312137: sched_waking:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>           <idle>-0       [001] d.H6  1495.312142: sched_wakeup:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>  cros_ec_spi_hig-179     [000] d.h5  1495.312859: sched_waking:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>  cros_ec_spi_hig-179     [000] d.h6  1495.312870: sched_wakeup:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>
> My best guess is that there's some bug in the scheduler where it just
> loops constantly picking an unthrottled RT task but then incorrectly
> decides that it's throttled and thus doesn't run it.

Thanks for posting this. Tricky bit indeed. I was wondering if the
issue is here:
https://elixir.bootlin.com/linux/latest/source/kernel/sched/rt.c#L1031

Basically, if something is actively getting RT-throttled on a CPU,
then if something else unthrottled but RT is ready to run, the
scheduler does resched_curr(rq) anyway. This probably happens when
update_curr_rt() is called on the unthrottled RT task via
task_tick_rt().    That seems a limitation of the current RT scheduler
I think? Thoughts?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RT_GROUP_SCHED throttling blocks unthrottled RT tasks?
  2021-11-05 23:04 ` Joel Fernandes
@ 2021-11-05 23:05   ` Joel Fernandes
  2021-11-05 23:10     ` Joel Fernandes
  0 siblings, 1 reply; 5+ messages in thread
From: Joel Fernandes @ 2021-11-05 23:05 UTC (permalink / raw)
  To: Doug Anderson; +Cc: Peter Zijlstra, Steven Rostedt, LKML

On Fri, Nov 5, 2021 at 7:04 PM Joel Fernandes <joelaf@google.com> wrote:
>
> On Fri, Nov 5, 2021 at 1:44 PM Doug Anderson <dianders@chromium.org> wrote:
> [..]
> >
> >
> > I tried gathering some tracing. One bit that might (?) be relevant:
> >
> >  cros_ec_spi_hig-179     [000] d.h5  1495.305919: sched_waking:
> > comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
> >  cros_ec_spi_hig-179     [000] d.h6  1495.305926: sched_wakeup:
> > comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
> >           <idle>-0       [001] d.H5  1495.309113: sched_waking:
> > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> >           <idle>-0       [001] d.H6  1495.309119: sched_wakeup:
> > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> >  cros_ec_spi_hig-179     [000] d.h5  1495.309336: sched_waking:
> > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> >  cros_ec_spi_hig-179     [000] d.h6  1495.309341: sched_wakeup:
> > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> >           <idle>-0       [001] d.H5  1495.312137: sched_waking:
> > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> >           <idle>-0       [001] d.H6  1495.312142: sched_wakeup:
> > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> >  cros_ec_spi_hig-179     [000] d.h5  1495.312859: sched_waking:
> > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> >  cros_ec_spi_hig-179     [000] d.h6  1495.312870: sched_wakeup:
> > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> >
> > My best guess is that there's some bug in the scheduler where it just
> > loops constantly picking an unthrottled RT task but then incorrectly
> > decides that it's throttled and thus doesn't run it.
>
> Thanks for posting this. Tricky bit indeed. I was wondering if the
> issue is here:
> https://elixir.bootlin.com/linux/latest/source/kernel/sched/rt.c#L1031

Errm, I meant Line 1060 in rt.c
https://elixir.bootlin.com/linux/latest/source/kernel/sched/rt.c#L1060

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RT_GROUP_SCHED throttling blocks unthrottled RT tasks?
  2021-11-05 23:05   ` Joel Fernandes
@ 2021-11-05 23:10     ` Joel Fernandes
  0 siblings, 0 replies; 5+ messages in thread
From: Joel Fernandes @ 2021-11-05 23:10 UTC (permalink / raw)
  To: Doug Anderson; +Cc: Peter Zijlstra, Steven Rostedt, LKML

On Fri, Nov 5, 2021 at 7:05 PM Joel Fernandes <joelaf@google.com> wrote:
>
> On Fri, Nov 5, 2021 at 7:04 PM Joel Fernandes <joelaf@google.com> wrote:
> >
> > On Fri, Nov 5, 2021 at 1:44 PM Doug Anderson <dianders@chromium.org> wrote:
> > [..]
> > >
> > >
> > > I tried gathering some tracing. One bit that might (?) be relevant:
> > >
> > >  cros_ec_spi_hig-179     [000] d.h5  1495.305919: sched_waking:
> > > comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
> > >  cros_ec_spi_hig-179     [000] d.h6  1495.305926: sched_wakeup:
> > > comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
> > >           <idle>-0       [001] d.H5  1495.309113: sched_waking:
> > > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> > >           <idle>-0       [001] d.H6  1495.309119: sched_wakeup:
> > > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> > >  cros_ec_spi_hig-179     [000] d.h5  1495.309336: sched_waking:
> > > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> > >  cros_ec_spi_hig-179     [000] d.h6  1495.309341: sched_wakeup:
> > > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> > >           <idle>-0       [001] d.H5  1495.312137: sched_waking:
> > > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> > >           <idle>-0       [001] d.H6  1495.312142: sched_wakeup:
> > > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> > >  cros_ec_spi_hig-179     [000] d.h5  1495.312859: sched_waking:
> > > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> > >  cros_ec_spi_hig-179     [000] d.h6  1495.312870: sched_wakeup:
> > > comm=sugov:6 pid=2658 prio=-1 target_cpu=006
> > >
> > > My best guess is that there's some bug in the scheduler where it just
> > > loops constantly picking an unthrottled RT task but then incorrectly
> > > decides that it's throttled and thus doesn't run it.
> >
> > Thanks for posting this. Tricky bit indeed. I was wondering if the
> > issue is here:
> > https://elixir.bootlin.com/linux/latest/source/kernel/sched/rt.c#L1031
>
> Errm, I meant Line 1060 in rt.c
> https://elixir.bootlin.com/linux/latest/source/kernel/sched/rt.c#L1060

AAARGH , I hate elixir. I meant this code in update_curr_rt() :

if (sched_rt_runtime_exceeded(rt_rq))
    resched_curr(rq);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RT_GROUP_SCHED throttling blocks unthrottled RT tasks?
  2021-11-05 17:44 RT_GROUP_SCHED throttling blocks unthrottled RT tasks? Doug Anderson
  2021-11-05 23:04 ` Joel Fernandes
@ 2021-11-16  1:13 ` Doug Anderson
  1 sibling, 0 replies; 5+ messages in thread
From: Doug Anderson @ 2021-11-16  1:13 UTC (permalink / raw)
  To: Peter Zijlstra, Steven Rostedt; +Cc: Joel Fernandes, LKML

Hi,

On Fri, Nov 5, 2021 at 10:44 AM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> I'm seeing a strange behavior that I _think_ is a bug. I'm hoping that
> some of the scheduling experts can tell me if I'm just
> misunderstanding or if this is truly a bug. To see it, I do this:
>
> --
>
> # Allow 1000 us more of RT at system and top cgroup
> old_rt=$(cat /proc/sys/kernel/sched_rt_runtime_us)
> echo $((old_rt + 1000)) > /proc/sys/kernel/sched_rt_runtime_us
> old_rt=$(cat /sys/fs/cgroup/cpu/cpu.rt_runtime_us)
> echo $((old_rt + 1000)) > /sys/fs/cgroup/cpu/cpu.rt_runtime_us
>
> # Give the 1000 us to my own group
> mkdir /sys/fs/cgroup/cpu/doug
> echo 1000 > /sys/fs/cgroup/cpu/doug/cpu.rt_runtime_us
>
> # Fork off a bunch of spinny things
> for i in $(seq 13); do
>   python -c "while True: pass"&
> done
>
> # Make my spinny things RT and put in my group
> # (assumes no other python is running!)
> for pid in $(ps aux | grep python | grep -v grep | awk '{print $2}'); do
>   echo $pid >> /sys/fs/cgroup/cpu/doug/tasks
>   chrt -p -f 99 $pid
> done
>
> --
>
> As expected, the spinny python tasks are pretty much throttled down to
> 0 in the above (they get 1 ms out of 1 second).
>
> However, _the bug_ is that the above basically causes all _other_ RT
> things in my system to stop functioning. I'm on an ARM Chromebook
> (sc7180-trogdor) and we communicate to our EC on a "realtime" thread
> due to SPI timing requirements. The above commands appear to starve
> the EC's communication task and (as far as I can tell) other RT tasks
> in the system.
>
> Notably:
>
> a) My EC comms slow to a crawl (eventually one gets through).
>
> b) "top" shows stuff like this:
>
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> COMMAND
>   179 root     -51   0       0      0      0 R 100.0   0.0   0:31.79
> cros_ec_spi_hig
>   180 root     -51   0       0      0      0 R  95.2   0.0   0:50.19
> irq/169-chromeo
>   184 root     -51   0       0      0      0 R  95.2   0.0   0:13.24
> spi10
>   221 root      -2   0       0      0      0 R  95.2   0.0   0:50.57
> ring0
>
> c) If I give my spinny tasks just a little bit more time than 1 ms
> then I get a hung task.
>
>
> When I'm testing the above, the non-RT stuff in the system continues
> to work OK though. I can even go in and kill all my python tasks and
> the system returns to normal.
>
> I tried gathering some tracing. One bit that might (?) be relevant:
>
>  cros_ec_spi_hig-179     [000] d.h5  1495.305919: sched_waking:
> comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
>  cros_ec_spi_hig-179     [000] d.h6  1495.305926: sched_wakeup:
> comm=kworker/4:2 pid=5232 prio=120 target_cpu=004
>           <idle>-0       [001] d.H5  1495.309113: sched_waking:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>           <idle>-0       [001] d.H6  1495.309119: sched_wakeup:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>  cros_ec_spi_hig-179     [000] d.h5  1495.309336: sched_waking:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>  cros_ec_spi_hig-179     [000] d.h6  1495.309341: sched_wakeup:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>           <idle>-0       [001] d.H5  1495.312137: sched_waking:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>           <idle>-0       [001] d.H6  1495.312142: sched_wakeup:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>  cros_ec_spi_hig-179     [000] d.h5  1495.312859: sched_waking:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>  cros_ec_spi_hig-179     [000] d.h6  1495.312870: sched_wakeup:
> comm=sugov:6 pid=2658 prio=-1 target_cpu=006
>
> My best guess is that there's some bug in the scheduler where it just
> loops constantly picking an unthrottled RT task but then incorrectly
> decides that it's throttled and thus doesn't run it.
>
> Most of my testing has been on the chromeos-5.4 kernel, but just in
> case I tried a vanilla v5.15 kernel and I could reproduce the same
> problems.
>
>
> Anyway, if I'm just doing something stupid then I appologize of the
> noise. If the above should work and you need me to gather more logging
> / try any experiments, I'm happy to do so.

I spent a little more time here and:

a) Managed to come up with a test case that makes it easy for anyone
to reproduce the problem themselves.

b) Managed to come up with a short patch that "fixes" this problem for me.

Since I had a potential fix, I posted this as a patch. Maybe we can
move the discussion to my post of that patch, which will hopefully
show up at:

https://lore.kernel.org/r/20211115170241.1.I94825a614577505bd1a8be9aeff208a49cb39b3d@changeid

...oddly that patch isn't showing up (?). It should be "[PATCH]
sched/rt: Don't reschedule a throttled task even if it's higher
priority". If it's not there tomorrow, I can try to repost again.

-Doug

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-16  4:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-05 17:44 RT_GROUP_SCHED throttling blocks unthrottled RT tasks? Doug Anderson
2021-11-05 23:04 ` Joel Fernandes
2021-11-05 23:05   ` Joel Fernandes
2021-11-05 23:10     ` Joel Fernandes
2021-11-16  1:13 ` Doug Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).