linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* SCHED_DEADLINE with CPU affinity
@ 2019-11-19 22:20 Philipp Stanner
  2019-11-20  8:50 ` Juri Lelli
  0 siblings, 1 reply; 6+ messages in thread
From: Philipp Stanner @ 2019-11-19 22:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Hagen Pfeifer, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman

Hey folks,
(please put me in CC when answering, I'm not subscribed)

I'm currently working student in the embedded industry. We have a device where
we need to be able to process network data within a certain deadline. At the
same time, safety is a primary requirement; that's why we construct everything
fully redundant. Meaning: We have two network interfaces, each IRQ then bound
to one CPU core and spawn a container (systemd-nspawn, cgroups based) which in
turn is bound to the corresponding CPU (CPU affinity masked).

        Container0       Container1
   -----------------  -----------------
   |               |  |               |
   |    Proc. A    |  |   Proc. A'    |
   |    Proc. B    |  |   Proc. B'    |
   |               |  |               |
   -----------------  -----------------
          ^                  ^
          |                  |
        CPU 0              CPU 1
          |                  |
       IRQ eth0           IRQ eth1


Within each container several processes are started. Ranging from systemd
(SCHED_OTHER) till two (soft) real-time critical processes: which we want to
execute via SCHED_DEADLINE.

Now, I've worked through the manpage describing scheduling policies, and it
seems that our scenario is forbidden my the kernel.  I've done some tests with
the syscalls sched_setattr and sched_setaffinity, trying to activate
SCHED_DEADLINE while also binding to a certain core.  It fails with EINVAL or
EINBUSY, depending on the order of the syscalls.

I've read that the kernel accomplishes plausibility checks when you ask for a
new deadline task to be scheduled, and I assume this check is what prevents us
from implementing our intended architecture.

Now, the questions we're having are:

   1. Why does the kernel do this, what is the problem with scheduling with
      SCHED_DEADLINE on a certain core? In contrast, how is it handled when
      you have single core systems etc.? Why this artificial limitation?
   2. How can we possibly implement this? We don't want to use SCHED_FIFO,
      because out-of-control tasks would freeze the entire container.

SCHED_RR / SCHED_FIFO will probably be a better policy compared to
SCHED_OTHER, but SCHED_DEADLINE is exactly what we are looking for.

Cheers,
Philipp

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SCHED_DEADLINE with CPU affinity
  2019-11-19 22:20 SCHED_DEADLINE with CPU affinity Philipp Stanner
@ 2019-11-20  8:50 ` Juri Lelli
  2019-12-24 10:03   ` Philipp Stanner
  0 siblings, 1 reply; 6+ messages in thread
From: Juri Lelli @ 2019-11-20  8:50 UTC (permalink / raw)
  To: Philipp Stanner
  Cc: linux-kernel, Hagen Pfeifer, mingo, peterz, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman

Hi Philipp,

On 19/11/19 23:20, Philipp Stanner wrote:
> Hey folks,
> (please put me in CC when answering, I'm not subscribed)
> 
> I'm currently working student in the embedded industry. We have a device where
> we need to be able to process network data within a certain deadline. At the
> same time, safety is a primary requirement; that's why we construct everything
> fully redundant. Meaning: We have two network interfaces, each IRQ then bound
> to one CPU core and spawn a container (systemd-nspawn, cgroups based) which in
> turn is bound to the corresponding CPU (CPU affinity masked).
> 
>         Container0       Container1
>    -----------------  -----------------
>    |               |  |               |
>    |    Proc. A    |  |   Proc. A'    |
>    |    Proc. B    |  |   Proc. B'    |
>    |               |  |               |
>    -----------------  -----------------
>           ^                  ^
>           |                  |
>         CPU 0              CPU 1
>           |                  |
>        IRQ eth0           IRQ eth1
> 
> 
> Within each container several processes are started. Ranging from systemd
> (SCHED_OTHER) till two (soft) real-time critical processes: which we want to
> execute via SCHED_DEADLINE.
> 
> Now, I've worked through the manpage describing scheduling policies, and it
> seems that our scenario is forbidden my the kernel.  I've done some tests with
> the syscalls sched_setattr and sched_setaffinity, trying to activate
> SCHED_DEADLINE while also binding to a certain core.  It fails with EINVAL or
> EINBUSY, depending on the order of the syscalls.
> 
> I've read that the kernel accomplishes plausibility checks when you ask for a

Yeah, admission control.

> new deadline task to be scheduled, and I assume this check is what prevents us
> from implementing our intended architecture.
> 
> Now, the questions we're having are:
> 
>    1. Why does the kernel do this, what is the problem with scheduling with
>       SCHED_DEADLINE on a certain core? In contrast, how is it handled when
>       you have single core systems etc.? Why this artificial limitation?

Please have also a look (you only mentioned manpage so, in case you
missed it) at

https://elixir.bootlin.com/linux/latest/source/Documentation/scheduler/sched-deadline.rst#L667

and the document in general should hopefully give you the answer about
why we need admission control and current limitations regarding
affinities.

>    2. How can we possibly implement this? We don't want to use SCHED_FIFO,
>       because out-of-control tasks would freeze the entire container.

I experimented myself a bit with this kind of setup in the past and I
think I made it work by pre-configuring exclusive cpusets (similarly as
what detailed in the doc above) and then starting containers inside such
exclusive sets with podman run --cgroup-parent option.

I don't have proper instructions yet for how to do this (plan to put
them together soon-ish), but please see if you can make it work with
this hint.

Best,

Juri


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SCHED_DEADLINE with CPU affinity
  2019-11-20  8:50 ` Juri Lelli
@ 2019-12-24 10:03   ` Philipp Stanner
  2020-01-13  9:22     ` Juri Lelli
  0 siblings, 1 reply; 6+ messages in thread
From: Philipp Stanner @ 2019-12-24 10:03 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, Hagen Pfeifer, mingo, peterz, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman

On Wed, 20.11.2019, 09:50 +0100 Juri Lelli wrote:
> Hi Philipp,

Hey Juri,

thanks so far; we indeed could make it work with exclusive CPU-sets.

On 19/11/19 23:20, Philipp Stanner wrote:
> 
> > from implementing our intended architecture.
> > 
> > Now, the questions we're having are:
> > 
> >    1. Why does the kernel do this, what is the problem with
> > scheduling with
> >       SCHED_DEADLINE on a certain core? In contrast, how is it
> > handled when
> >       you have single core systems etc.? Why this artificial
> > limitation?
> 
> Please have also a look (you only mentioned manpage so, in case you
> missed it) at
> 
> https://elixir.bootlin.com/linux/latest/source/Documentation/scheduler/sched-deadline.rst#L667
> 
> and the document in general should hopefully give you the answer
> about
> why we need admission control and current limitations regarding
> affinities.
> 
> >    2. How can we possibly implement this? We don't want to use
> > SCHED_FIFO,
> >       because out-of-control tasks would freeze the entire
> > container.
> 
> I experimented myself a bit with this kind of setup in the past and I
> think I made it work by pre-configuring exclusive cpusets (similarly
> as
> what detailed in the doc above) and then starting containers inside
> such
> exclusive sets with podman run --cgroup-parent option.
> 
> I don't have proper instructions yet for how to do this (plan to put
> them together soon-ish), but please see if you can make it work with
> this hint.

I fear I have not understood quite well yet why this
"workaround" leads to (presumably) the same results as set_affinity
would. From what I have read, I understand it as follows: For
sched_dead, admission control tries to guarantee that the requested
policy can be executed. To do so, it analyzes the current workload
situation, taking especially the number of cores into account.

Now, with a pre-configured set, the kernel knows which tasks will run
on which core, therefore it's able to judge wether a process can be
deadline scheduled or not. But when using the default way, you could
start your processes as SCHED_OTHER, set SCHED_DEADLINE as policy and
later many of them could suddenly call set_affinity, desiring to run on
the same core, therefore provoking collisions.

Is my understanding of the situation correct?

Merry Christmas,
P.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SCHED_DEADLINE with CPU affinity
  2019-12-24 10:03   ` Philipp Stanner
@ 2020-01-13  9:22     ` Juri Lelli
  0 siblings, 0 replies; 6+ messages in thread
From: Juri Lelli @ 2020-01-13  9:22 UTC (permalink / raw)
  To: Philipp Stanner
  Cc: linux-kernel, Hagen Pfeifer, mingo, peterz, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman

Hi,

Sorry for the delay in repling (Xmas + catching-up w/ emails).

On 24/12/19 11:03, Philipp Stanner wrote:
> On Wed, 20.11.2019, 09:50 +0100 Juri Lelli wrote:
> > Hi Philipp,
> 
> Hey Juri,
> 
> thanks so far; we indeed could make it work with exclusive CPU-sets.

Good. :-)

> On 19/11/19 23:20, Philipp Stanner wrote:
> > 
> > > from implementing our intended architecture.
> > > 
> > > Now, the questions we're having are:
> > > 
> > >    1. Why does the kernel do this, what is the problem with
> > > scheduling with
> > >       SCHED_DEADLINE on a certain core? In contrast, how is it
> > > handled when
> > >       you have single core systems etc.? Why this artificial
> > > limitation?
> > 
> > Please have also a look (you only mentioned manpage so, in case you
> > missed it) at
> > 
> > https://elixir.bootlin.com/linux/latest/source/Documentation/scheduler/sched-deadline.rst#L667
> > 
> > and the document in general should hopefully give you the answer
> > about
> > why we need admission control and current limitations regarding
> > affinities.
> > 
> > >    2. How can we possibly implement this? We don't want to use
> > > SCHED_FIFO,
> > >       because out-of-control tasks would freeze the entire
> > > container.
> > 
> > I experimented myself a bit with this kind of setup in the past and I
> > think I made it work by pre-configuring exclusive cpusets (similarly
> > as
> > what detailed in the doc above) and then starting containers inside
> > such
> > exclusive sets with podman run --cgroup-parent option.
> > 
> > I don't have proper instructions yet for how to do this (plan to put
> > them together soon-ish), but please see if you can make it work with
> > this hint.
> 
> I fear I have not understood quite well yet why this
> "workaround" leads to (presumably) the same results as set_affinity
> would. From what I have read, I understand it as follows: For
> sched_dead, admission control tries to guarantee that the requested
> policy can be executed. To do so, it analyzes the current workload
> situation, taking especially the number of cores into account.
> 
> Now, with a pre-configured set, the kernel knows which tasks will run
> on which core, therefore it's able to judge wether a process can be
> deadline scheduled or not. But when using the default way, you could
> start your processes as SCHED_OTHER, set SCHED_DEADLINE as policy and
> later many of them could suddenly call set_affinity, desiring to run on
> the same core, therefore provoking collisions.

But setting affinity would still have to pass admission control, and
should fail in the case you are describing (IIUC).

https://elixir.bootlin.com/linux/latest/source/kernel/sched/core.c#L5433

Best,

Juri


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SCHED_DEADLINE with CPU affinity
  2020-01-14  9:44 stanner
@ 2020-01-15  8:10 ` Juri Lelli
  0 siblings, 0 replies; 6+ messages in thread
From: Juri Lelli @ 2020-01-15  8:10 UTC (permalink / raw)
  To: stanner
  Cc: linux-kernel, Hagen Pfeifer, mingo, peterz, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman

On 14/01/20 10:44, stanner@posteo.de wrote:
> 
> 
> Am 13.01.2020 10:22 schrieb Juri Lelli:
> > Hi,
> > 
> > Sorry for the delay in repling (Xmas + catching-up w/ emails).
> 
> No worries
> 
> > > I fear I have not understood quite well yet why this
> > > "workaround" leads to (presumably) the same results as set_affinity
> > > would. From what I have read, I understand it as follows: For
> > > sched_dead, admission control tries to guarantee that the requested
> > > policy can be executed. To do so, it analyzes the current workload
> > > situation, taking especially the number of cores into account.
> > > 
> > > Now, with a pre-configured set, the kernel knows which tasks will run
> > > on which core, therefore it's able to judge wether a process can be
> > > deadline scheduled or not. But when using the default way, you could
> > > start your processes as SCHED_OTHER, set SCHED_DEADLINE as policy and
> > > later many of them could suddenly call set_affinity, desiring to run
> > > on
> > > the same core, therefore provoking collisions.
> > 
> > But setting affinity would still have to pass admission control, and
> > should fail in the case you are describing (IIUC).
> > 
> > https://elixir.bootlin.com/linux/latest/source/kernel/sched/core.c#L5433
> 
> Well, no, that's not what I meant.
> I understand that the kernel currently rejects the combination of
> set_affinity and
> sched_setattr.
> My question, basically is: Why does it work with exclusive cpu-sets?
> 
> As I wrote above, I assume that the difference is that the kernel knows
> which
> programs will run on which core beforehand and therefore can check the
> rules of admission control, whereas without exclusive cpu_sets it could
> happen
> any time that certain (other) deadline applications decide to switch cores
> manually,
> causing collisions with a deadline task already running on this core.
> 
> You originally wrote that this solution is "currently" required; that's why
> assume that
> in theory the admission control check could also be done dynamically when
> sched_setattr or set_affinity are called (after each other, without
> exclusive cpu sets).
> 
> Have I been clear enough now? Basically I want to know why
> cpusets+sched_deadline
> works whereas set_affinity+sched_deadline is rejected, although both seem to
> lead
> to the same result.

Oh, OK, I think I now got the question (sorry :-).

So, (exclusive) cpusets define "isolated" domains of CPUs among which
DEADLINE tasks can freely migrate following the global EDF rule:

https://elixir.bootlin.com/linux/latest/source/Documentation/scheduler/sched-deadline.rst#L413

Relaxing this constraint and allowing users to define per-task
affinities (via setaffinity) can lead to situations where affinity masks
of different tasks overlap, and this creates problem for the admission
control checks that are currently implemented.

Theory has been developed already that tackles the problem of
overlapping affinities, but that is not yet implemented (it adds
complexity that would have to be supported by strong usecases).

Best,

Juri


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SCHED_DEADLINE with CPU affinity
@ 2020-01-14  9:44 stanner
  2020-01-15  8:10 ` Juri Lelli
  0 siblings, 1 reply; 6+ messages in thread
From: stanner @ 2020-01-14  9:44 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, Hagen Pfeifer, mingo, peterz, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman



Am 13.01.2020 10:22 schrieb Juri Lelli:
> Hi,
> 
> Sorry for the delay in repling (Xmas + catching-up w/ emails).

No worries

>> I fear I have not understood quite well yet why this
>> "workaround" leads to (presumably) the same results as set_affinity
>> would. From what I have read, I understand it as follows: For
>> sched_dead, admission control tries to guarantee that the requested
>> policy can be executed. To do so, it analyzes the current workload
>> situation, taking especially the number of cores into account.
>> 
>> Now, with a pre-configured set, the kernel knows which tasks will run
>> on which core, therefore it's able to judge wether a process can be
>> deadline scheduled or not. But when using the default way, you could
>> start your processes as SCHED_OTHER, set SCHED_DEADLINE as policy and
>> later many of them could suddenly call set_affinity, desiring to run 
>> on
>> the same core, therefore provoking collisions.
> 
> But setting affinity would still have to pass admission control, and
> should fail in the case you are describing (IIUC).
> 
> https://elixir.bootlin.com/linux/latest/source/kernel/sched/core.c#L5433

Well, no, that's not what I meant.
I understand that the kernel currently rejects the combination of 
set_affinity and
sched_setattr.
My question, basically is: Why does it work with exclusive cpu-sets?

As I wrote above, I assume that the difference is that the kernel knows 
which
programs will run on which core beforehand and therefore can check the
rules of admission control, whereas without exclusive cpu_sets it could 
happen
any time that certain (other) deadline applications decide to switch 
cores manually,
causing collisions with a deadline task already running on this core.

You originally wrote that this solution is "currently" required; that's 
why assume that
in theory the admission control check could also be done dynamically 
when
sched_setattr or set_affinity are called (after each other, without 
exclusive cpu sets).

Have I been clear enough now? Basically I want to know why 
cpusets+sched_deadline
works whereas set_affinity+sched_deadline is rejected, although both 
seem to lead
to the same result.

P.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-01-15  8:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-19 22:20 SCHED_DEADLINE with CPU affinity Philipp Stanner
2019-11-20  8:50 ` Juri Lelli
2019-12-24 10:03   ` Philipp Stanner
2020-01-13  9:22     ` Juri Lelli
2020-01-14  9:44 stanner
2020-01-15  8:10 ` Juri Lelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).