* yielding while running SCHED_DEADLINE
@ 2018-09-14 23:13 Patel, Vedang
2018-09-17 9:26 ` Juri Lelli
0 siblings, 1 reply; 5+ messages in thread
From: Patel, Vedang @ 2018-09-14 23:13 UTC (permalink / raw)
To: linux-rt-users; +Cc: Koppolu, Chanakya
Hi all,
We have been playing around with SCHED_DEADLINE and found some
discrepancy around the calculation of nr_involuntary_switches and
nr_voluntary_switches in /proc/${PID}/sched.
Whenever the task is done with it's work earlier and executes
sched_yield() to voluntarily gives up the CPU this increments
nr_involuntary_switches. It should have incremented
nr_voluntary_switches.
This can be easily demonstrated by running cyclicdeadline task which is
part of rt-tests(https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests
.git/) and checking the value of nr_voluntary_switches.
Please note that the issue seems to be with sched_yield() and not
SCHED_DEADLINE because we have seen similar behavior when we tried
switching to other policies. But, we are using SCHED_DEADLINE because
it is one of the (very) few scenarios where sched_yield() can be used
correctly.
Some analysis:
--------------
I enabled the sched/sched_switch (setting cyclicdeadline as filter) and
syscalls/sys_enter_sched_yield events to check whether the
sched_yield() call was resulting in a new task running. I got the
following results:
cyclicdeadline-3290 [003] ....... 3111.132786: tracing_mark_write: start at 3111125101 off=3 (period=3111125098 next=3111126098)
cyclicdeadline-3290 [003] ....1.. 3111.132789: sys_sched_yield()
cyclicdeadline-3290 [003] d...2.. 3111.132797: sched_switch: prev_comm=cyclicdeadline prev_pid=3290 prev_prio=-1 prev_state=R ==> next_comm=swapper/3 next_pid=0 next_prio=120
cyclicdeadline-3290 [003] ....... 3111.133786: tracing_mark_write: start at 3111126101 off=3 (period=3111126098 next=3111127098)
cyclicdeadline-3290 [003] ....1.. 3111.133789: sys_sched_yield()
cyclicdeadline-3290 [003] d...2.. 3111.133797: sched_switch: prev_comm=cyclicdeadline prev_pid=3290 prev_prio=-1 prev_state=R ==> next_comm=swapper/3 next_pid=0 next_prio=120
cyclicdeadline-3290 [003] ....... 3111.134786: tracing_mark_write: start at 3111127101 off=3 (period=3111127098 next=3111128098)
cyclicdeadline-3290 [003] ....1.. 3111.134789: sys_sched_yield()
cyclicdeadline-3290 [003] d...2.. 3111.134797: sched_switch: prev_comm=cyclicdeadline prev_pid=3290 prev_prio=-1 prev_state=R ==> next_comm=swapper/3 next_pid=0 next_prio=120
....
As seen above, all the sched_yield calls are followed by sched switch.
So, we believe that the sched_yield() is actually resulting in a
switch. The values for nr_voluntary_switches/nr_involuntary_switches in
this scenario:
nr_switches : 138753
nr_voluntary_switches : 1
nr_involuntary_switches : 138752
Looking at __schedule() in kernel/sched/core.c, the switch is counted
as part of nr_involuntary_switches if the task has not been preempted
and the task is TASK_RUNNING state. This does not seem to happen when
sched_yield() is called.
Is there something we are missing over here? OR Is this a known issue
and is planned to be fixed later?
Thanks,
Vedang Patel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: yielding while running SCHED_DEADLINE
2018-09-14 23:13 yielding while running SCHED_DEADLINE Patel, Vedang
@ 2018-09-17 9:26 ` Juri Lelli
2018-09-17 11:42 ` Peter Zijlstra
0 siblings, 1 reply; 5+ messages in thread
From: Juri Lelli @ 2018-09-17 9:26 UTC (permalink / raw)
To: Patel, Vedang; +Cc: linux-rt-users, Koppolu, Chanakya, Peter Zijlstra
Hi,
On 14/09/18 23:13, Patel, Vedang wrote:
> Hi all,
>
> We have been playing around with SCHED_DEADLINE and found some
> discrepancy around the calculation of nr_involuntary_switches and
> nr_voluntary_switches in /proc/${PID}/sched.
>
> Whenever the task is done with it's work earlier and executes
> sched_yield() to voluntarily gives up the CPU this increments
> nr_involuntary_switches. It should have incremented
> nr_voluntary_switches.
Mmm, I see what you are saying.
[...]
> Looking at __schedule() in kernel/sched/core.c, the switch is counted
> as part of nr_involuntary_switches if the task has not been preempted
> and the task is TASK_RUNNING state. This does not seem to happen when
> sched_yield() is called.
Mmm,
- nr_voluntary_switches++ if !preempt && !RUNNING
- nr_involuntary_switches++ otherwise (yield fits this as the task is
still RUNNING, even though throttled for DEADLINE)
Not sure this is the same as what you say above..
> Is there something we are missing over here? OR Is this a known issue
> and is planned to be fixed later?
.. however, not sure. Peter, what you say. It looks like we might indeed
want to account yield as a voluntary switch, seems to fit. In this case
I guess we could use a flag or add a sched_ bit to task_struct to handle
the case?
Best,
- Juri
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: yielding while running SCHED_DEADLINE
2018-09-17 9:26 ` Juri Lelli
@ 2018-09-17 11:42 ` Peter Zijlstra
2018-09-17 17:14 ` Patel, Vedang
2018-09-21 0:19 ` Bowles, Matthew K
0 siblings, 2 replies; 5+ messages in thread
From: Peter Zijlstra @ 2018-09-17 11:42 UTC (permalink / raw)
To: Juri Lelli; +Cc: Patel, Vedang, linux-rt-users, Koppolu, Chanakya
On Mon, Sep 17, 2018 at 11:26:48AM +0200, Juri Lelli wrote:
> Hi,
>
> On 14/09/18 23:13, Patel, Vedang wrote:
> > Hi all,
> >
> > We have been playing around with SCHED_DEADLINE and found some
> > discrepancy around the calculation of nr_involuntary_switches and
> > nr_voluntary_switches in /proc/${PID}/sched.
> >
> > Whenever the task is done with it's work earlier and executes
> > sched_yield() to voluntarily gives up the CPU this increments
> > nr_involuntary_switches. It should have incremented
> > nr_voluntary_switches.
>
> Mmm, I see what you are saying.
>
> [...]
>
> > Looking at __schedule() in kernel/sched/core.c, the switch is counted
> > as part of nr_involuntary_switches if the task has not been preempted
> > and the task is TASK_RUNNING state. This does not seem to happen when
> > sched_yield() is called.
>
> Mmm,
>
> - nr_voluntary_switches++ if !preempt && !RUNNING
> - nr_involuntary_switches++ otherwise (yield fits this as the task is
> still RUNNING, even though throttled for DEADLINE)
>
> Not sure this is the same as what you say above..
>
> > Is there something we are missing over here? OR Is this a known issue
> > and is planned to be fixed later?
>
> .. however, not sure. Peter, what you say. It looks like we might indeed
> want to account yield as a voluntary switch, seems to fit. In this case
> I guess we could use a flag or add a sched_ bit to task_struct to handle
> the case?
It's been like this _forever_ afaict. This isn't deadline specific
afaict, all yield callers will end up in non-voluntary switches.
I don't know anybody that cares and I don't think this is something
worth fixing. If someone did rely on this behaviour we'd break them, and
i'd much rather save a cycle than add more stupid stats crap to the
scheduler.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: yielding while running SCHED_DEADLINE
2018-09-17 11:42 ` Peter Zijlstra
@ 2018-09-17 17:14 ` Patel, Vedang
2018-09-21 0:19 ` Bowles, Matthew K
1 sibling, 0 replies; 5+ messages in thread
From: Patel, Vedang @ 2018-09-17 17:14 UTC (permalink / raw)
To: juri.lelli, peterz; +Cc: linux-rt-users, Koppolu, Chanakya
On Mon, 2018-09-17 at 13:42 +0200, Peter Zijlstra wrote:
> On Mon, Sep 17, 2018 at 11:26:48AM +0200, Juri Lelli wrote:
> >
> > Hi,
> >
> > On 14/09/18 23:13, Patel, Vedang wrote:
> > >
> > > Hi all,
> > >
> > > We have been playing around with SCHED_DEADLINE and found some
> > > discrepancy around the calculation of nr_involuntary_switches and
> > > nr_voluntary_switches in /proc/${PID}/sched.
> > >
> > > Whenever the task is done with it's work earlier and executes
> > > sched_yield() to voluntarily gives up the CPU this increments
> > > nr_involuntary_switches. It should have incremented
> > > nr_voluntary_switches.
> > Mmm, I see what you are saying.
> >
> > [...]
> >
> > >
> > > Looking at __schedule() in kernel/sched/core.c, the switch is
> > > counted
> > > as part of nr_involuntary_switches if the task has not been
> > > preempted
> > > and the task is TASK_RUNNING state. This does not seem to happen
> > > when
> > > sched_yield() is called.
> > Mmm,
> >
> > - nr_voluntary_switches++ if !preempt && !RUNNING
> > - nr_involuntary_switches++ otherwise (yield fits this as the task
> > is
> > still RUNNING, even though throttled for DEADLINE)
> >
> > Not sure this is the same as what you say above..
> >
> > >
> > > Is there something we are missing over here? OR Is this a known
> > > issue
> > > and is planned to be fixed later?
> > .. however, not sure. Peter, what you say. It looks like we might
> > indeed
> > want to account yield as a voluntary switch, seems to fit. In this
> > case
> > I guess we could use a flag or add a sched_ bit to task_struct to
> > handle
> > the case?
> It's been like this _forever_ afaict. This isn't deadline specific
> afaict, all yield callers will end up in non-voluntary switches.
>
> I don't know anybody that cares and I don't think this is something
> worth fixing. If someone did rely on this behaviour we'd break them,
> and
> i'd much rather save a cycle than add more stupid stats crap to the
> scheduler.
Thanks Peter and Juri for the response.
We will try to use a different mechanism to account for this.
-Vedang
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: yielding while running SCHED_DEADLINE
2018-09-17 11:42 ` Peter Zijlstra
2018-09-17 17:14 ` Patel, Vedang
@ 2018-09-21 0:19 ` Bowles, Matthew K
1 sibling, 0 replies; 5+ messages in thread
From: Bowles, Matthew K @ 2018-09-21 0:19 UTC (permalink / raw)
To: linux-rt-users
I’m fine with not fixing this behavior since Vedang has mentioned, we
can use different mechanisms to achieve the same goal. However, I would
like to go on the record as someone that cares about this functionality.
Assuming that sched_yield was counted as part of nr_voluntary
switches, then the specific scenario in which I would find this
statistic useful is during debug of latency spikes on a realtime thread.
In particular, I could quickly correlate whether or not a latency
spike occurred due to being swapped out by the scheduler.
On Mon, 2018-09-17 at 13:42 +0200, Peter Zijlstra wrote:
> On Mon, Sep 17, 2018 at 11:26:48AM +0200, Juri Lelli wrote:
> >
> > Hi,
> >
> > On 14/09/18 23:13, Patel, Vedang wrote:
> > >
> > > Hi all,
> > >
> > > We have been playing around with SCHED_DEADLINE and found some
> > > discrepancy around the calculation of nr_involuntary_switches and
> > > nr_voluntary_switches in /proc/${PID}/sched.
> > >
> > > Whenever the task is done with it's work earlier and executes
> > > sched_yield() to voluntarily gives up the CPU this increments
> > > nr_involuntary_switches. It should have incremented
> > > nr_voluntary_switches.
> > Mmm, I see what you are saying.
> >
> > [...]
> >
> > >
> > > Looking at __schedule() in kernel/sched/core.c, the switch is
> > > counted
> > > as part of nr_involuntary_switches if the task has not been
> > > preempted
> > > and the task is TASK_RUNNING state. This does not seem to happen
> > > when
> > > sched_yield() is called.
> > Mmm,
> >
> > - nr_voluntary_switches++ if !preempt && !RUNNING
> > - nr_involuntary_switches++ otherwise (yield fits this as the task
> > is
> > still RUNNING, even though throttled for DEADLINE)
> >
> > Not sure this is the same as what you say above..
> >
> > >
> > > Is there something we are missing over here? OR Is this a known
> > > issue
> > > and is planned to be fixed later?
> > .. however, not sure. Peter, what you say. It looks like we might
> > indeed
> > want to account yield as a voluntary switch, seems to fit. In this
> > case
> > I guess we could use a flag or add a sched_ bit to task_struct to
> > handle
> > the case?
> It's been like this _forever_ afaict. This isn't deadline specific
> afaict, all yield callers will end up in non-voluntary switches.
>
> I don't know anybody that cares and I don't think this is something
> worth fixing. If someone did rely on this behaviour we'd break them,
> and
> i'd much rather save a cycle than add more stupid stats crap to the
> scheduler.
Thanks Peter and Juri for the response.
We will try to use a different mechanism to account for this.
-Vedang
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-09-21 6:06 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-14 23:13 yielding while running SCHED_DEADLINE Patel, Vedang
2018-09-17 9:26 ` Juri Lelli
2018-09-17 11:42 ` Peter Zijlstra
2018-09-17 17:14 ` Patel, Vedang
2018-09-21 0:19 ` Bowles, Matthew K
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.