All of lore.kernel.org
 help / color / mirror / Atom feed
* yielding while running SCHED_DEADLINE
@ 2018-09-14 23:13 Patel, Vedang
  2018-09-17  9:26 ` Juri Lelli
  0 siblings, 1 reply; 5+ messages in thread
From: Patel, Vedang @ 2018-09-14 23:13 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Koppolu, Chanakya

Hi all, 

We have been playing around with SCHED_DEADLINE and found some
discrepancy around the calculation of nr_involuntary_switches and
nr_voluntary_switches in /proc/${PID}/sched.

Whenever the task is done with it's work earlier and executes
sched_yield() to voluntarily gives up the CPU this increments
nr_involuntary_switches. It should have incremented
nr_voluntary_switches.

This can be easily demonstrated by running cyclicdeadline task which is
part of rt-tests(https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests
.git/) and checking the value of nr_voluntary_switches.

Please note that the issue seems to be with sched_yield() and not
SCHED_DEADLINE because we have seen similar behavior when we tried
switching to other policies. But, we are using SCHED_DEADLINE because
it is one of the (very) few scenarios where sched_yield() can be used
correctly.

Some analysis:
--------------

I enabled the sched/sched_switch (setting cyclicdeadline as filter) and
syscalls/sys_enter_sched_yield events to check whether the
sched_yield() call was resulting in a new task running. I got the
following results:

  cyclicdeadline-3290  [003] .......  3111.132786: tracing_mark_write: start at 3111125101 off=3 (period=3111125098 next=3111126098)
  cyclicdeadline-3290  [003] ....1..  3111.132789: sys_sched_yield()
  cyclicdeadline-3290  [003] d...2..  3111.132797: sched_switch: prev_comm=cyclicdeadline prev_pid=3290 prev_prio=-1 prev_state=R ==> next_comm=swapper/3 next_pid=0 next_prio=120
  cyclicdeadline-3290  [003] .......  3111.133786: tracing_mark_write: start at 3111126101 off=3 (period=3111126098 next=3111127098)
  cyclicdeadline-3290  [003] ....1..  3111.133789: sys_sched_yield()
  cyclicdeadline-3290  [003] d...2..  3111.133797: sched_switch: prev_comm=cyclicdeadline prev_pid=3290 prev_prio=-1 prev_state=R ==> next_comm=swapper/3 next_pid=0 next_prio=120
  cyclicdeadline-3290  [003] .......  3111.134786: tracing_mark_write: start at 3111127101 off=3 (period=3111127098 next=3111128098)
  cyclicdeadline-3290  [003] ....1..  3111.134789: sys_sched_yield()
  cyclicdeadline-3290  [003] d...2..  3111.134797: sched_switch: prev_comm=cyclicdeadline prev_pid=3290 prev_prio=-1 prev_state=R ==> next_comm=swapper/3 next_pid=0 next_prio=120
  ....

As seen above, all the sched_yield calls are followed by sched switch.
So, we believe that the sched_yield() is actually resulting in a
switch. The values for nr_voluntary_switches/nr_involuntary_switches in
this scenario:

nr_switches                                  :               138753
nr_voluntary_switches                        :                    1
nr_involuntary_switches                      :               138752

Looking at __schedule() in kernel/sched/core.c, the switch is counted
as part of nr_involuntary_switches if the task has not been preempted
and the task is TASK_RUNNING state. This does not seem to happen when
sched_yield() is called.

Is there something we are missing over here? OR Is this a known issue
and is planned to be fixed later?

Thanks,
Vedang Patel

 
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: yielding while running SCHED_DEADLINE
  2018-09-14 23:13 yielding while running SCHED_DEADLINE Patel, Vedang
@ 2018-09-17  9:26 ` Juri Lelli
  2018-09-17 11:42   ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Juri Lelli @ 2018-09-17  9:26 UTC (permalink / raw)
  To: Patel, Vedang; +Cc: linux-rt-users, Koppolu, Chanakya, Peter Zijlstra

Hi,

On 14/09/18 23:13, Patel, Vedang wrote:
> Hi all, 
> 
> We have been playing around with SCHED_DEADLINE and found some
> discrepancy around the calculation of nr_involuntary_switches and
> nr_voluntary_switches in /proc/${PID}/sched.
> 
> Whenever the task is done with it's work earlier and executes
> sched_yield() to voluntarily gives up the CPU this increments
> nr_involuntary_switches. It should have incremented
> nr_voluntary_switches.

Mmm, I see what you are saying.

[...]

> Looking at __schedule() in kernel/sched/core.c, the switch is counted
> as part of nr_involuntary_switches if the task has not been preempted
> and the task is TASK_RUNNING state. This does not seem to happen when
> sched_yield() is called.

Mmm,

 - nr_voluntary_switches++ if !preempt && !RUNNING
 - nr_involuntary_switches++ otherwise (yield fits this as the task is
   still RUNNING, even though throttled for DEADLINE)

Not sure this is the same as what you say above..

> Is there something we are missing over here? OR Is this a known issue
> and is planned to be fixed later?

.. however, not sure. Peter, what you say. It looks like we might indeed
want to account yield as a voluntary switch, seems to fit. In this case
I guess we could use a flag or add a sched_ bit to task_struct to handle
the case?

Best,

- Juri

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: yielding while running SCHED_DEADLINE
  2018-09-17  9:26 ` Juri Lelli
@ 2018-09-17 11:42   ` Peter Zijlstra
  2018-09-17 17:14     ` Patel, Vedang
  2018-09-21  0:19     ` Bowles, Matthew K
  0 siblings, 2 replies; 5+ messages in thread
From: Peter Zijlstra @ 2018-09-17 11:42 UTC (permalink / raw)
  To: Juri Lelli; +Cc: Patel, Vedang, linux-rt-users, Koppolu, Chanakya

On Mon, Sep 17, 2018 at 11:26:48AM +0200, Juri Lelli wrote:
> Hi,
> 
> On 14/09/18 23:13, Patel, Vedang wrote:
> > Hi all, 
> > 
> > We have been playing around with SCHED_DEADLINE and found some
> > discrepancy around the calculation of nr_involuntary_switches and
> > nr_voluntary_switches in /proc/${PID}/sched.
> > 
> > Whenever the task is done with it's work earlier and executes
> > sched_yield() to voluntarily gives up the CPU this increments
> > nr_involuntary_switches. It should have incremented
> > nr_voluntary_switches.
> 
> Mmm, I see what you are saying.
> 
> [...]
> 
> > Looking at __schedule() in kernel/sched/core.c, the switch is counted
> > as part of nr_involuntary_switches if the task has not been preempted
> > and the task is TASK_RUNNING state. This does not seem to happen when
> > sched_yield() is called.
> 
> Mmm,
> 
>  - nr_voluntary_switches++ if !preempt && !RUNNING
>  - nr_involuntary_switches++ otherwise (yield fits this as the task is
>    still RUNNING, even though throttled for DEADLINE)
> 
> Not sure this is the same as what you say above..
> 
> > Is there something we are missing over here? OR Is this a known issue
> > and is planned to be fixed later?
> 
> .. however, not sure. Peter, what you say. It looks like we might indeed
> want to account yield as a voluntary switch, seems to fit. In this case
> I guess we could use a flag or add a sched_ bit to task_struct to handle
> the case?

It's been like this _forever_ afaict. This isn't deadline specific
afaict, all yield callers will end up in non-voluntary switches.

I don't know anybody that cares and I don't think this is something
worth fixing. If someone did rely on this behaviour we'd break them, and
i'd much rather save a cycle than add more stupid stats crap to the
scheduler.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: yielding while running SCHED_DEADLINE
  2018-09-17 11:42   ` Peter Zijlstra
@ 2018-09-17 17:14     ` Patel, Vedang
  2018-09-21  0:19     ` Bowles, Matthew K
  1 sibling, 0 replies; 5+ messages in thread
From: Patel, Vedang @ 2018-09-17 17:14 UTC (permalink / raw)
  To: juri.lelli, peterz; +Cc: linux-rt-users, Koppolu, Chanakya

On Mon, 2018-09-17 at 13:42 +0200, Peter Zijlstra wrote:
> On Mon, Sep 17, 2018 at 11:26:48AM +0200, Juri Lelli wrote:
> > 
> > Hi,
> > 
> > On 14/09/18 23:13, Patel, Vedang wrote:
> > > 
> > > Hi all, 
> > > 
> > > We have been playing around with SCHED_DEADLINE and found some
> > > discrepancy around the calculation of nr_involuntary_switches and
> > > nr_voluntary_switches in /proc/${PID}/sched.
> > > 
> > > Whenever the task is done with it's work earlier and executes
> > > sched_yield() to voluntarily gives up the CPU this increments
> > > nr_involuntary_switches. It should have incremented
> > > nr_voluntary_switches.
> > Mmm, I see what you are saying.
> > 
> > [...]
> > 
> > > 
> > > Looking at __schedule() in kernel/sched/core.c, the switch is
> > > counted
> > > as part of nr_involuntary_switches if the task has not been
> > > preempted
> > > and the task is TASK_RUNNING state. This does not seem to happen
> > > when
> > > sched_yield() is called.
> > Mmm,
> > 
> >  - nr_voluntary_switches++ if !preempt && !RUNNING
> >  - nr_involuntary_switches++ otherwise (yield fits this as the task
> > is
> >    still RUNNING, even though throttled for DEADLINE)
> > 
> > Not sure this is the same as what you say above..
> > 
> > > 
> > > Is there something we are missing over here? OR Is this a known
> > > issue
> > > and is planned to be fixed later?
> > .. however, not sure. Peter, what you say. It looks like we might
> > indeed
> > want to account yield as a voluntary switch, seems to fit. In this
> > case
> > I guess we could use a flag or add a sched_ bit to task_struct to
> > handle
> > the case?
> It's been like this _forever_ afaict. This isn't deadline specific
> afaict, all yield callers will end up in non-voluntary switches.
> 
> I don't know anybody that cares and I don't think this is something
> worth fixing. If someone did rely on this behaviour we'd break them,
> and
> i'd much rather save a cycle than add more stupid stats crap to the
> scheduler.
Thanks Peter and Juri for the response.

We will try to use a different mechanism to account for this.

-Vedang

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: yielding while running SCHED_DEADLINE
  2018-09-17 11:42   ` Peter Zijlstra
  2018-09-17 17:14     ` Patel, Vedang
@ 2018-09-21  0:19     ` Bowles, Matthew K
  1 sibling, 0 replies; 5+ messages in thread
From: Bowles, Matthew K @ 2018-09-21  0:19 UTC (permalink / raw)
  To: linux-rt-users

I’m fine with not fixing this behavior since Vedang has mentioned, we 
can use different mechanisms to achieve the same goal.  However, I would 
like to go on the record as someone that cares about this functionality. 
  Assuming that sched_yield was counted as part of nr_voluntary 
switches, then the specific scenario in which I would find this 
statistic useful is during debug of latency spikes on a realtime thread. 
  In particular, I could quickly correlate whether or not a latency 
spike occurred due to being swapped out by the scheduler.

On Mon, 2018-09-17 at 13:42 +0200, Peter Zijlstra wrote:
 > On Mon, Sep 17, 2018 at 11:26:48AM +0200, Juri Lelli wrote:
 > >
 > > Hi,
 > >
 > > On 14/09/18 23:13, Patel, Vedang wrote:
 > > >
 > > > Hi all,
 > > >
 > > > We have been playing around with SCHED_DEADLINE and found some
 > > > discrepancy around the calculation of nr_involuntary_switches and
 > > > nr_voluntary_switches in /proc/${PID}/sched.
 > > >
 > > > Whenever the task is done with it's work earlier and executes
 > > > sched_yield() to voluntarily gives up the CPU this increments
 > > > nr_involuntary_switches. It should have incremented
 > > > nr_voluntary_switches.
 > > Mmm, I see what you are saying.
 > >
 > > [...]
 > >
 > > >
 > > > Looking at __schedule() in kernel/sched/core.c, the switch is
 > > > counted
 > > > as part of nr_involuntary_switches if the task has not been
 > > > preempted
 > > > and the task is TASK_RUNNING state. This does not seem to happen
 > > > when
 > > > sched_yield() is called.
 > > Mmm,
 > >
 > >  - nr_voluntary_switches++ if !preempt && !RUNNING
 > >  - nr_involuntary_switches++ otherwise (yield fits this as the task
 > > is
 > >    still RUNNING, even though throttled for DEADLINE)
 > >
 > > Not sure this is the same as what you say above..
 > >
 > > >
 > > > Is there something we are missing over here? OR Is this a known
 > > > issue
 > > > and is planned to be fixed later?
 > > .. however, not sure. Peter, what you say. It looks like we might
 > > indeed
 > > want to account yield as a voluntary switch, seems to fit. In this
 > > case
 > > I guess we could use a flag or add a sched_ bit to task_struct to
 > > handle
 > > the case?
 > It's been like this _forever_ afaict. This isn't deadline specific
 > afaict, all yield callers will end up in non-voluntary switches.
 >
 > I don't know anybody that cares and I don't think this is something
 > worth fixing. If someone did rely on this behaviour we'd break them,
 > and
 > i'd much rather save a cycle than add more stupid stats crap to the
 > scheduler.
Thanks Peter and Juri for the response.

We will try to use a different mechanism to account for this.

-Vedang

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-09-21  6:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-14 23:13 yielding while running SCHED_DEADLINE Patel, Vedang
2018-09-17  9:26 ` Juri Lelli
2018-09-17 11:42   ` Peter Zijlstra
2018-09-17 17:14     ` Patel, Vedang
2018-09-21  0:19     ` Bowles, Matthew K

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.