yielding while running SCHED_DEADLINE

* yielding while running SCHED_DEADLINE
@ 2018-09-14 23:13 Patel, Vedang
  2018-09-17  9:26 ` Juri Lelli
  0 siblings, 1 reply; 5+ messages in thread
From: Patel, Vedang @ 2018-09-14 23:13 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Koppolu, Chanakya

Hi all, 

We have been playing around with SCHED_DEADLINE and found some
discrepancy around the calculation of nr_involuntary_switches and
nr_voluntary_switches in /proc/${PID}/sched.

Whenever the task is done with it's work earlier and executes
sched_yield() to voluntarily gives up the CPU this increments
nr_involuntary_switches. It should have incremented
nr_voluntary_switches.

This can be easily demonstrated by running cyclicdeadline task which is
part of rt-tests(https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests
.git/) and checking the value of nr_voluntary_switches.

Please note that the issue seems to be with sched_yield() and not
SCHED_DEADLINE because we have seen similar behavior when we tried
switching to other policies. But, we are using SCHED_DEADLINE because
it is one of the (very) few scenarios where sched_yield() can be used
correctly.

Some analysis:
--------------

I enabled the sched/sched_switch (setting cyclicdeadline as filter) and
syscalls/sys_enter_sched_yield events to check whether the
sched_yield() call was resulting in a new task running. I got the
following results:

  cyclicdeadline-3290  [003] .......  3111.132786: tracing_mark_write: start at 3111125101 off=3 (period=3111125098 next=3111126098)
  cyclicdeadline-3290  [003] ....1..  3111.132789: sys_sched_yield()
  cyclicdeadline-3290  [003] d...2..  3111.132797: sched_switch: prev_comm=cyclicdeadline prev_pid=3290 prev_prio=-1 prev_state=R ==> next_comm=swapper/3 next_pid=0 next_prio=120
  cyclicdeadline-3290  [003] .......  3111.133786: tracing_mark_write: start at 3111126101 off=3 (period=3111126098 next=3111127098)
  cyclicdeadline-3290  [003] ....1..  3111.133789: sys_sched_yield()
  cyclicdeadline-3290  [003] d...2..  3111.133797: sched_switch: prev_comm=cyclicdeadline prev_pid=3290 prev_prio=-1 prev_state=R ==> next_comm=swapper/3 next_pid=0 next_prio=120
  cyclicdeadline-3290  [003] .......  3111.134786: tracing_mark_write: start at 3111127101 off=3 (period=3111127098 next=3111128098)
  cyclicdeadline-3290  [003] ....1..  3111.134789: sys_sched_yield()
  cyclicdeadline-3290  [003] d...2..  3111.134797: sched_switch: prev_comm=cyclicdeadline prev_pid=3290 prev_prio=-1 prev_state=R ==> next_comm=swapper/3 next_pid=0 next_prio=120
  ....

As seen above, all the sched_yield calls are followed by sched switch.
So, we believe that the sched_yield() is actually resulting in a
switch. The values for nr_voluntary_switches/nr_involuntary_switches in
this scenario:

nr_switches                                  :               138753
nr_voluntary_switches                        :                    1
nr_involuntary_switches                      :               138752

Looking at __schedule() in kernel/sched/core.c, the switch is counted
as part of nr_involuntary_switches if the task has not been preempted
and the task is TASK_RUNNING state. This does not seem to happen when
sched_yield() is called.

Is there something we are missing over here? OR Is this a known issue
and is planned to be fixed later?

Thanks,
Vedang Patel

^ permalink raw reply	[flat|nested] 5+ messages in thread