All of lore.kernel.org
 help / color / mirror / Atom feed
* On migrate_disable() and latencies
@ 2011-07-22 10:19 Peter Zijlstra
  2011-07-22 10:49 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Peter Zijlstra @ 2011-07-22 10:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, linux-rt-users, Ingo Molnar, Carsten Emde, Clark Williams,
	Paul E. McKenney, Kumar Gala, Ralf Baechle, rostedt

On Wed, 2011-07-20 at 02:37 +0200, Thomas Gleixner wrote:
>    - Twist your brain around the schedulability impact of the
>      migrate_disable() approach.
> 
>      A really interesting research topic for our friends from the
>      academic universe. Relevant and conclusive (even short notice)
>      papers and/or talks on that topic have a reserved slot in the
>      Kernel developers track at the Realtime Linux Workshop in Prague
>      in October this year. 

>From what I can tell it can induce a latency in the order of
max-migrate-disable-period * nr-cpus.

The scenario is on where you stack N migrate-disable tasks on a run
queue (necessarily of increasing priority). Doing this requires all cpus
in the system to be as busy, for otherwise the task would simply be
moved to another cpu.

Anyway, once you manage to stack these migrate-disable tasks, all other
tasks go to sleep, leaving a vacuum. Normally we would migrate tasks to
fill the vacuum left by the tasks going to sleep, but clearly
migrate-disable prohibits this.

So we have this stack of migrate-disable tasks and M-1 idle cpus (loss
of utilization). Now it takes the length of the migrate-disable region
of the highest priority task on the stack (the one running) to complete
and enable migration again. This will instantly move the task away to an
idle cpu. This will then need to happen min(N-1, M-1) times before the
lowest priority migrate_disable task can run again or all cpus are busy.

Therefore the worst case latency is in the order of
max-migrate-disable-period * nr-cpus.

Currently we have no means of measuring these latencies, this is
something we need to grow, I think Steven can fairly easy craft a
migrate_disable runtime tracer -- it needs to use t->se.sum_exec_runtime
for measure so as to only count the actual time spend on the task and
ignore any time it was blocked.

Once we have this, its back to the old game of 'lock'-breaking.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: On migrate_disable() and latencies
  2011-07-22 10:19 On migrate_disable() and latencies Peter Zijlstra
@ 2011-07-22 10:49 ` Peter Zijlstra
  2011-07-22 14:57   ` Peter Zijlstra
  2011-07-22 17:45 ` Nicholas Mc Guire
  2011-07-23  0:39 ` Paul E. McKenney
  2 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2011-07-22 10:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, linux-rt-users, Ingo Molnar, Carsten Emde, Clark Williams,
	Paul E. McKenney, Kumar Gala, Ralf Baechle, rostedt

On Fri, 2011-07-22 at 12:19 +0200, Peter Zijlstra wrote:
> On Wed, 2011-07-20 at 02:37 +0200, Thomas Gleixner wrote:
> >    - Twist your brain around the schedulability impact of the
> >      migrate_disable() approach.
> > 
> >      A really interesting research topic for our friends from the
> >      academic universe. Relevant and conclusive (even short notice)
> >      papers and/or talks on that topic have a reserved slot in the
> >      Kernel developers track at the Realtime Linux Workshop in Prague
> >      in October this year. 
> 
> From what I can tell it can induce a latency in the order of
> max-migrate-disable-period * nr-cpus.
> 
> The scenario is on where you stack N migrate-disable tasks on a run
> queue (necessarily of increasing priority). Doing this requires all cpus
> in the system to be as busy, for otherwise the task would simply be
> moved to another cpu.

This implies it requires at least nr-cpus^2 tasks to pull this off. So
if you want to add the nr-tasks constraint the actual limit is something
like:

   max-migrate-disable-period * min(nr-cpus, nr-tasks / nr_cpus);

> Anyway, once you manage to stack these migrate-disable tasks, all other
> tasks go to sleep, leaving a vacuum. Normally we would migrate tasks to
> fill the vacuum left by the tasks going to sleep, but clearly
> migrate-disable prohibits this.
> 
> So we have this stack of migrate-disable tasks and M-1 idle cpus (loss
> of utilization). Now it takes the length of the migrate-disable region
> of the highest priority task on the stack (the one running) to complete
> and enable migration again. This will instantly move the task away to an
> idle cpu. This will then need to happen min(N-1, M-1) times before the
> lowest priority migrate_disable task can run again or all cpus are busy.
> 
> Therefore the worst case latency is in the order of
> max-migrate-disable-period * nr-cpus.
> 
> Currently we have no means of measuring these latencies, this is
> something we need to grow, I think Steven can fairly easy craft a
> migrate_disable runtime tracer -- it needs to use t->se.sum_exec_runtime
> for measure so as to only count the actual time spend on the task and
> ignore any time it was blocked.
> 
> Once we have this, its back to the old game of 'lock'-breaking.
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: On migrate_disable() and latencies
  2011-07-22 10:49 ` Peter Zijlstra
@ 2011-07-22 14:57   ` Peter Zijlstra
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2011-07-22 14:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, linux-rt-users, Ingo Molnar, Carsten Emde, Clark Williams,
	Paul E. McKenney, Kumar Gala, Ralf Baechle, rostedt

On Fri, 2011-07-22 at 12:49 +0200, Peter Zijlstra wrote:
> > The scenario is on where you stack N migrate-disable tasks on a run
> > queue (necessarily of increasing priority). Doing this requires all cpus
> > in the system to be as busy, for otherwise the task would simply be
> > moved to another cpu.
> 
> This implies it requires at least nr-cpus^2 tasks to pull this off. 

OK, scrap that, 2*nr-cpus is enough.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: On migrate_disable() and latencies
  2011-07-22 10:19 On migrate_disable() and latencies Peter Zijlstra
  2011-07-22 10:49 ` Peter Zijlstra
@ 2011-07-22 17:45 ` Nicholas Mc Guire
  2011-07-25  8:21   ` Peter Zijlstra
  2011-07-23  0:39 ` Paul E. McKenney
  2 siblings, 1 reply; 15+ messages in thread
From: Nicholas Mc Guire @ 2011-07-22 17:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, linux-rt-users, Ingo Molnar, Carsten Emde,
	Clark Williams, Paul E. McKenney, Kumar Gala, Ralf Baechle,
	rostedt

On Fri, 22 Jul 2011, Peter Zijlstra wrote:

> On Wed, 2011-07-20 at 02:37 +0200, Thomas Gleixner wrote:
> >    - Twist your brain around the schedulability impact of the
> >      migrate_disable() approach.
> > 
> >      A really interesting research topic for our friends from the
> >      academic universe. Relevant and conclusive (even short notice)
> >      papers and/or talks on that topic have a reserved slot in the
> >      Kernel developers track at the Realtime Linux Workshop in Prague
> >      in October this year. 
> 
> From what I can tell it can induce a latency in the order of
> max-migrate-disable-period * nr-cpus.
> 
> The scenario is on where you stack N migrate-disable tasks on a run
> queue (necessarily of increasing priority). Doing this requires all cpus
> in the system to be as busy, for otherwise the task would simply be
> moved to another cpu.
> 
> Anyway, once you manage to stack these migrate-disable tasks, all other
> tasks go to sleep, leaving a vacuum. Normally we would migrate tasks to
> fill the vacuum left by the tasks going to sleep, but clearly
> migrate-disable prohibits this.
> 
> So we have this stack of migrate-disable tasks and M-1 idle cpus (loss
> of utilization). Now it takes the length of the migrate-disable region
> of the highest priority task on the stack (the one running) to complete
> and enable migration again. This will instantly move the task away to an
> idle cpu. This will then need to happen min(N-1, M-1) times before the
> lowest priority migrate_disable task can run again or all cpus are busy.
> 
> Therefore the worst case latency is in the order of
> max-migrate-disable-period * nr-cpus.

+ something like sum of (interrupt rate [n] / max-migrate-disable-period * nr-cpus) * top-half handler [n]. if you go on with theoretical WCET analysis on multicore systems you will always end up at the conclusion that only UP is suitable for RT....

> 
> Currently we have no means of measuring these latencies, this is
> something we need to grow, I think Steven can fairly easy craft a
> migrate_disable runtime tracer -- it needs to use t->se.sum_exec_runtime
> for measure so as to only count the actual time spend on the task and
> ignore any time it was blocked.
> 
well this is a similar problem as with the WCET "calculations" - you can
calculate theoretical worst cases - but the question is what the actual
distribution of "stacking" is and thus what the probability is that you
manage to stack tasks in this way. A further issue here is the system 
design - if you have a RT system with M RT tasks, an unknown number
of non-RT tasks (dynamic) and N Cores then it would be a quite logical
system design to pin some of the RT tasks to CPUs and thus make such
stacking szenarios unlikely to impossible whilc at the same time retaining
the full load balancing of the non-rt tasks. If you want to ensure determinism
then relying on migration and availability of idle CPUs to migrate to is
in my opinion a design problem that needs to be resolved at the level of
the respective RT-task set.

If you are not talking about hard real-time guarantees then the question of
how probable such a szenario is most likely is a sufficient guarantee.

One could I guess put some relatively simple instrumentation in to monitor
this stacking problem - quit independant of actually measuring the times 

> Once we have this, its back to the old game of 'lock'-breaking.
>
if the stacking problem does not practically exist then it might not be worth
the effort to resolve it with elaborate lock breaking.

hofrat

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: On migrate_disable() and latencies
  2011-07-22 10:19 On migrate_disable() and latencies Peter Zijlstra
  2011-07-22 10:49 ` Peter Zijlstra
  2011-07-22 17:45 ` Nicholas Mc Guire
@ 2011-07-23  0:39 ` Paul E. McKenney
  2011-07-25  8:30   ` Peter Zijlstra
  2 siblings, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2011-07-23  0:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, linux-rt-users, Ingo Molnar, Carsten Emde,
	Clark Williams, Kumar Gala, Ralf Baechle, rostedt

On Fri, Jul 22, 2011 at 12:19:52PM +0200, Peter Zijlstra wrote:
> On Wed, 2011-07-20 at 02:37 +0200, Thomas Gleixner wrote:
> >    - Twist your brain around the schedulability impact of the
> >      migrate_disable() approach.
> > 
> >      A really interesting research topic for our friends from the
> >      academic universe. Relevant and conclusive (even short notice)
> >      papers and/or talks on that topic have a reserved slot in the
> >      Kernel developers track at the Realtime Linux Workshop in Prague
> >      in October this year. 
> 
> >From what I can tell it can induce a latency in the order of
> max-migrate-disable-period * nr-cpus.
> 
> The scenario is on where you stack N migrate-disable tasks on a run
> queue (necessarily of increasing priority). Doing this requires all cpus
> in the system to be as busy, for otherwise the task would simply be
> moved to another cpu.
> 
> Anyway, once you manage to stack these migrate-disable tasks, all other
> tasks go to sleep, leaving a vacuum. Normally we would migrate tasks to
> fill the vacuum left by the tasks going to sleep, but clearly
> migrate-disable prohibits this.
> 
> So we have this stack of migrate-disable tasks and M-1 idle cpus (loss
> of utilization). Now it takes the length of the migrate-disable region
> of the highest priority task on the stack (the one running) to complete
> and enable migration again. This will instantly move the task away to an
> idle cpu. This will then need to happen min(N-1, M-1) times before the
> lowest priority migrate_disable task can run again or all cpus are busy.
> 
> Therefore the worst case latency is in the order of
> max-migrate-disable-period * nr-cpus.

OK, but wouldn't that be the latency as seen be the lowest-priority
task?  Or are migrate-disable tasks given preferential treatment?
If not, a prio-99 task would get the same latency either way, right?

Migration-disable can magnify the latency seen by low-priority tasks, if
I understand correctly.  If you disabled preemption, when a low-priority
task became runnable, it would find an idle CPU.  But with migration
disable, the lowest-priority task might enter a migration-disable region,
then be preempted by a marginally higher-priority task that also enters
a migration-diable region, and is also preempted, and so on.  The
lowest-priority task cannot run on the current CPU because of all
the higher-priority tasks, and cannot migrate due to being in a
migration-disable section.

In other words, as is often the case, better worst-case service to
the high-priority tasks can multiply the latency seen by the
low-priority tasks.

So is the topic to quantify this?  If so, my take is that the latency
to the highest-priority task decreases by an amount roughly equal to
the duration of the longest preempt_disable() region that turned into a
migration-disable region, while that to the lowest-priority task increases
by N-1 times the CPU overhead of the longest migration-disable region,
plus context switches.  (Yes, this is a very crude rule-of-thumb model.
A real model would have much higher mathematics and might use a more
detailed understanding of the workload.)

Or am I misunderstanding how all this works?

							Thanx, Paul

> Currently we have no means of measuring these latencies, this is
> something we need to grow, I think Steven can fairly easy craft a
> migrate_disable runtime tracer -- it needs to use t->se.sum_exec_runtime
> for measure so as to only count the actual time spend on the task and
> ignore any time it was blocked.
> 
> Once we have this, its back to the old game of 'lock'-breaking.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: On migrate_disable() and latencies
  2011-07-22 17:45 ` Nicholas Mc Guire
@ 2011-07-25  8:21   ` Peter Zijlstra
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2011-07-25  8:21 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: Thomas Gleixner, LKML, linux-rt-users, Ingo Molnar, Carsten Emde,
	Clark Williams, Paul E. McKenney, Kumar Gala, Ralf Baechle,
	rostedt

On Fri, 2011-07-22 at 19:45 +0200, Nicholas Mc Guire wrote:

> > Therefore the worst case latency is in the order of
> > max-migrate-disable-period * nr-cpus.
> 
> + something like sum of (interrupt rate [n] /
> max-migrate-disable-period * nr-cpus) * top-half handler [n]. if you
> go on with theoretical WCET analysis on multicore systems you will
> always end up at the conclusion that only UP is suitable for RT....

+ preemption cost + migration costs etc.. :-)

> > Currently we have no means of measuring these latencies, this is
> > something we need to grow, I think Steven can fairly easy craft a
> > migrate_disable runtime tracer -- it needs to use t->se.sum_exec_runtime
> > for measure so as to only count the actual time spend on the task and
> > ignore any time it was blocked.
>  
> well this is a similar problem as with the WCET "calculations" - you can
> calculate theoretical worst cases - but the question is what the actual
> distribution of "stacking" is and thus what the probability is that you
> manage to stack tasks in this way.

Sure, and yes those are thing to look at.

> One could I guess put some relatively simple instrumentation in to monitor
> this stacking problem - quit independant of actually measuring the times 

Yes, that would be very easy to do.

> > Once we have this, its back to the old game of 'lock'-breaking.
> >
> if the stacking problem does not practically exist then it might not be worth
> the effort to resolve it with elaborate lock breaking.

Fully agreed, its just that I wanted to share my understanding of the
problem space (both Thomas and I thought on-list email was a better
location that private IRC logs ;-), and we need to get the know the
problem before we can decide to ignore it ;-)

Ignoring unknowns always leads to surprises.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: On migrate_disable() and latencies
  2011-07-23  0:39 ` Paul E. McKenney
@ 2011-07-25  8:30   ` Peter Zijlstra
  2011-07-25 21:17     ` Paul E. McKenney
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2011-07-25  8:30 UTC (permalink / raw)
  To: paulmck
  Cc: Thomas Gleixner, LKML, linux-rt-users, Ingo Molnar, Carsten Emde,
	Clark Williams, Kumar Gala, Ralf Baechle, rostedt

On Fri, 2011-07-22 at 17:39 -0700, Paul E. McKenney wrote:
> > Therefore the worst case latency is in the order of
> > max-migrate-disable-period * nr-cpus.
> 
> OK, but wouldn't that be the latency as seen be the lowest-priority
> task?  

It would indeed, the utility loss is added to the preemption cost of the
lower priority tasks.

> Or are migrate-disable tasks given preferential treatment?
> If not, a prio-99 task would get the same latency either way, right?

Right.

> Migration-disable can magnify the latency seen by low-priority tasks, if
> I understand correctly.  If you disabled preemption, when a low-priority
> task became runnable, it would find an idle CPU.  But with migration
> disable, the lowest-priority task might enter a migration-disable region,
> then be preempted by a marginally higher-priority task that also enters
> a migration-diable region, and is also preempted, and so on.  The
> lowest-priority task cannot run on the current CPU because of all
> the higher-priority tasks, and cannot migrate due to being in a
> migration-disable section.

Exactly so.

> In other words, as is often the case, better worst-case service to
> the high-priority tasks can multiply the latency seen by the
> low-priority tasks.
> 
> So is the topic to quantify this?  

I suppose it is indeed. Even for the SoftRT case we need to make sure
the total utilization loss is indeed acceptable.

> If so, my take is that the latency
> to the highest-priority task decreases by an amount roughly equal to
> the duration of the longest preempt_disable() region that turned into a
> migration-disable region, while that to the lowest-priority task increases
> by N-1 times the CPU overhead of the longest migration-disable region,
> plus context switches.  (Yes, this is a very crude rule-of-thumb model.
> A real model would have much higher mathematics and might use a more
> detailed understanding of the workload.)
> 
> Or am I misunderstanding how all this works? 

No, I think you're gettin' it.

My main worry with all this is that we have these insane long !preempt
regions in mainline that are now !migrate regions, and thus per all the
above we could be looking at a substantial utilization loss.

Alternatively we could all be missing something far more horrid, but
that might just be my paranoia talking.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: On migrate_disable() and latencies
  2011-07-25  8:30   ` Peter Zijlstra
@ 2011-07-25 21:17     ` Paul E. McKenney
  2011-07-27 11:13       ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2011-07-25 21:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, linux-rt-users, Ingo Molnar, Carsten Emde,
	Clark Williams, Kumar Gala, Ralf Baechle, rostedt

On Mon, Jul 25, 2011 at 10:30:53AM +0200, Peter Zijlstra wrote:
> On Fri, 2011-07-22 at 17:39 -0700, Paul E. McKenney wrote:
> > > Therefore the worst case latency is in the order of
> > > max-migrate-disable-period * nr-cpus.
> > 
> > OK, but wouldn't that be the latency as seen be the lowest-priority
> > task?  
> 
> It would indeed, the utility loss is added to the preemption cost of the
> lower priority tasks.

OK.

> > Or are migrate-disable tasks given preferential treatment?
> > If not, a prio-99 task would get the same latency either way, right?
> 
> Right.
> 
> > Migration-disable can magnify the latency seen by low-priority tasks, if
> > I understand correctly.  If you disabled preemption, when a low-priority
> > task became runnable, it would find an idle CPU.  But with migration
> > disable, the lowest-priority task might enter a migration-disable region,
> > then be preempted by a marginally higher-priority task that also enters
> > a migration-diable region, and is also preempted, and so on.  The
> > lowest-priority task cannot run on the current CPU because of all
> > the higher-priority tasks, and cannot migrate due to being in a
> > migration-disable section.
> 
> Exactly so.
> 
> > In other words, as is often the case, better worst-case service to
> > the high-priority tasks can multiply the latency seen by the
> > low-priority tasks.
> > 
> > So is the topic to quantify this?  
> 
> I suppose it is indeed. Even for the SoftRT case we need to make sure
> the total utilization loss is indeed acceptable.

OK.  If you are doing strict priority, then everything below the highest
priority is workload dependent.  OK, OK, given Linux's throttling,
everything below the two highest priorities is workload dependent
(assuming I understand the throttling correctly).  The higher-priority
tasks can absolutely starve the lower-priority ones, with or without
the migrate-disable capability.

Another way of looking at it is from the viewpoint of the additional
priority-boost events.  If preemption is disabled, the low-priority task
will execute through the preempt-disable region without context switching.
In contrast, given a migration-disable region, the low-priority task
might be preempted and then boosted.  (If I understand correctly, if some
higher-priority task tries to enter the same type of migration-disable
region, it will acquire the associated lock, thus priority-boosting the
task that is already in that region.)

One stupid-but-tractable way to model this is to have an interarrival
rate for the various process priorities, and then calculate the odds of
(1) a higher priority process arriving while the low-priority one is
in a *-disable region and (2) that higher priority process needing to
enter a conflicting *-disable region.  This would give you some measure
of the added boosting load due to migration-disable as compared to
preemption-disable.

Would this sort of result be useful?

> > If so, my take is that the latency
> > to the highest-priority task decreases by an amount roughly equal to
> > the duration of the longest preempt_disable() region that turned into a
> > migration-disable region, while that to the lowest-priority task increases
> > by N-1 times the CPU overhead of the longest migration-disable region,
> > plus context switches.  (Yes, this is a very crude rule-of-thumb model.
> > A real model would have much higher mathematics and might use a more
> > detailed understanding of the workload.)
> > 
> > Or am I misunderstanding how all this works? 
> 
> No, I think you're gettin' it.

I was afraid of that.  ;-)

> My main worry with all this is that we have these insane long !preempt
> regions in mainline that are now !migrate regions, and thus per all the
> above we could be looking at a substantial utilization loss.
> 
> Alternatively we could all be missing something far more horrid, but
> that might just be my paranoia talking.

Ah, good point -- if each migration-disable region is associated with
a lock, then you -could- allow migration and gain better utilization
at the expense of worse caching behavior.  Is that the concern?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: On migrate_disable() and latencies
  2011-07-25 21:17     ` Paul E. McKenney
@ 2011-07-27 11:13       ` Peter Zijlstra
  2011-07-27 18:30         ` Paul E. McKenney
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2011-07-27 11:13 UTC (permalink / raw)
  To: paulmck
  Cc: Thomas Gleixner, LKML, linux-rt-users, Ingo Molnar, Carsten Emde,
	Clark Williams, Kumar Gala, Ralf Baechle, rostedt,
	Nicholas Mc Guire

On Mon, 2011-07-25 at 14:17 -0700, Paul E. McKenney wrote:

> > I suppose it is indeed. Even for the SoftRT case we need to make sure
> > the total utilization loss is indeed acceptable.
> 
> OK.  If you are doing strict priority, then everything below the highest
> priority is workload dependent. 

<snip throttling, that's a whole different thing>

>  The higher-priority
> tasks can absolutely starve the lower-priority ones, with or without
> the migrate-disable capability.

Sure, that's how FIFO works, but it also relies on the fact that once
your high priority task completes the lower priority task resumes.

The extension to SMP is that we run the m highest priority tasks on n
cpus ; where m <= n. Any loss in utilization (idle time in this
particular case, but irq/preemption/migration and cache overhead are
also time not spend on the actual workload.

Now the WCET folks are all about quantifying the needs of applications
and the utilization limits of the OS etc. And while for SoftRT you can
relax quite a few of the various bounds you still need to know them in
order relax them (der Hofrat likes to move from worst case to avg case
IIRC).

> Another way of looking at it is from the viewpoint of the additional
> priority-boost events.  If preemption is disabled, the low-priority task
> will execute through the preempt-disable region without context switching.
> In contrast, given a migration-disable region, the low-priority task
> might be preempted and then boosted.  (If I understand correctly, if some
> higher-priority task tries to enter the same type of migration-disable
> region, it will acquire the associated lock, thus priority-boosting the
> task that is already in that region.)

No, there is no boosting involved, migrate_disable() isn't intrinsically
tied to a lock or other PI construct. We might needs locks to keep some
of the per-cpu crap correct, but that again, is a whole different ball
game.

But even if it was, I don't think PI will help any for this, we still
need to complete the various migrate_disable() sections, see below.

> One stupid-but-tractable way to model this is to have an interarrival
> rate for the various process priorities, and then calculate the odds of
> (1) a higher priority process arriving while the low-priority one is
> in a *-disable region and (2) that higher priority process needing to
> enter a conflicting *-disable region.  This would give you some measure
> of the added boosting load due to migration-disable as compared to
> preemption-disable.
> 
> Would this sort of result be useful?

Yes, such type of analysis can be used, and I guess we can measure
various variables related to that.

> > My main worry with all this is that we have these insane long !preempt
> > regions in mainline that are now !migrate regions, and thus per all the
> > above we could be looking at a substantial utilization loss.
> > 
> > Alternatively we could all be missing something far more horrid, but
> > that might just be my paranoia talking.
> 
> Ah, good point -- if each migration-disable region is associated with
> a lock, then you -could- allow migration and gain better utilization
> at the expense of worse caching behavior.  Is that the concern?

I'm not seeing how that would be true, suppose you have this stack of 4
migrate_disable() sections and 3 idle cpus, no amount of boosting will
make the already running task at the top of the stack go any faster, and
it needs to complete the migrate_disable section before it can be
migrated, equally so for the rest, so you still need
3*migrate-disable-period of time before all your cpus are busy again.

You can move another task to the top of the stack by boosting, but
you'll need 3 tasks to complete their resp migrate-disable section, it
doesn't matter which task, so boosting doesn't change anything.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: On migrate_disable() and latencies
  2011-07-27 11:13       ` Peter Zijlstra
@ 2011-07-27 18:30         ` Paul E. McKenney
  2011-07-28  5:50           ` Yong Zhang
  2011-07-28  7:54           ` Nicholas Mc Guire
  0 siblings, 2 replies; 15+ messages in thread
From: Paul E. McKenney @ 2011-07-27 18:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, linux-rt-users, Ingo Molnar, Carsten Emde,
	Clark Williams, Kumar Gala, Ralf Baechle, rostedt,
	Nicholas Mc Guire

On Wed, Jul 27, 2011 at 01:13:18PM +0200, Peter Zijlstra wrote:
> On Mon, 2011-07-25 at 14:17 -0700, Paul E. McKenney wrote:
> 
> > > I suppose it is indeed. Even for the SoftRT case we need to make sure
> > > the total utilization loss is indeed acceptable.
> > 
> > OK.  If you are doing strict priority, then everything below the highest
> > priority is workload dependent. 
> 
> <snip throttling, that's a whole different thing>
> 
> >  The higher-priority
> > tasks can absolutely starve the lower-priority ones, with or without
> > the migrate-disable capability.
> 
> Sure, that's how FIFO works, but it also relies on the fact that once
> your high priority task completes the lower priority task resumes.
> 
> The extension to SMP is that we run the m highest priority tasks on n
> cpus ; where m <= n. Any loss in utilization (idle time in this
> particular case, but irq/preemption/migration and cache overhead are
> also time not spend on the actual workload.
> 
> Now the WCET folks are all about quantifying the needs of applications
> and the utilization limits of the OS etc. And while for SoftRT you can
> relax quite a few of the various bounds you still need to know them in
> order relax them (der Hofrat likes to move from worst case to avg case
> IIRC).

;-)

> > Another way of looking at it is from the viewpoint of the additional
> > priority-boost events.  If preemption is disabled, the low-priority task
> > will execute through the preempt-disable region without context switching.
> > In contrast, given a migration-disable region, the low-priority task
> > might be preempted and then boosted.  (If I understand correctly, if some
> > higher-priority task tries to enter the same type of migration-disable
> > region, it will acquire the associated lock, thus priority-boosting the
> > task that is already in that region.)
> 
> No, there is no boosting involved, migrate_disable() isn't intrinsically
> tied to a lock or other PI construct. We might needs locks to keep some
> of the per-cpu crap correct, but that again, is a whole different ball
> game.
> 
> But even if it was, I don't think PI will help any for this, we still
> need to complete the various migrate_disable() sections, see below.

OK, got it.  I think, anyway.  I was incorrectly (or at least unhelpfully)
pulling in locks that might be needed to handle per-CPU variables.

> > One stupid-but-tractable way to model this is to have an interarrival
> > rate for the various process priorities, and then calculate the odds of
> > (1) a higher priority process arriving while the low-priority one is
> > in a *-disable region and (2) that higher priority process needing to
> > enter a conflicting *-disable region.  This would give you some measure
> > of the added boosting load due to migration-disable as compared to
> > preemption-disable.
> > 
> > Would this sort of result be useful?
> 
> Yes, such type of analysis can be used, and I guess we can measure
> various variables related to that.

OK, good.

> > > My main worry with all this is that we have these insane long !preempt
> > > regions in mainline that are now !migrate regions, and thus per all the
> > > above we could be looking at a substantial utilization loss.
> > > 
> > > Alternatively we could all be missing something far more horrid, but
> > > that might just be my paranoia talking.
> > 
> > Ah, good point -- if each migration-disable region is associated with
> > a lock, then you -could- allow migration and gain better utilization
> > at the expense of worse caching behavior.  Is that the concern?
> 
> I'm not seeing how that would be true, suppose you have this stack of 4
> migrate_disable() sections and 3 idle cpus, no amount of boosting will
> make the already running task at the top of the stack go any faster, and
> it needs to complete the migrate_disable section before it can be
> migrated, equally so for the rest, so you still need
> 3*migrate-disable-period of time before all your cpus are busy again.
> 
> You can move another task to the top of the stack by boosting, but
> you'll need 3 tasks to complete their resp migrate-disable section, it
> doesn't matter which task, so boosting doesn't change anything.

OK, so let me see if I understand what you are looking to model.

o	There are no locks.

o	There are a finite number of tasks with varying priorities.
	(I would initially work with a single task per priority
	level, but IIRC it is not hard to make multiple tasks per
	priority work.  Not a fan of infinite numbers of priorities,
	though!)

o	There are multiple CPUs.

o	Once a task enters a migrate-disable region, it must remain
	on that CPU.  (I will initially model the migrate-disable region
	as consuming a fixed amount of CPU.  If I wanted to really wuss
	out, I would model it as consuming an exponentially distributed
	amount of CPU.)

o	Tasks awakening outside of migrate-disable regions will pick
	the CPU running the lowest-priority task, whether or not this
	task is in migrate-disable state.  (At least I don't see
	anything in 3.0-rt3 that looks like a scheduling decision
	based on ->migrate_disable, perhaps due to blindness.)

o	For an example, if all CPUs except for one are running prio-99
	tasks, and the remaining CPU is running a prio-1 task in
	a migrate-disable region, if a prio-2 tasks awakens, it
	will preempt the prio-1 task.

	On the other hand, if at least one of the CPUs was idle,
	the prio-2 task would have instead run on that idle CPU.

o	The transition probabilities depend on the priority
	of the currently running migrate-disable CPU -- the higher
	that priority, the greater the chance that any preempting
	task will find some other CPU instead.

	The recurrence times depend on the number of tasks stacked
	up in migrate-disable regions on that CPU.

If this all holds, it would be possible to compute the probability
of a given migrate-disable region being preempted and if preempted,
the expected duration of that preemption, given the following
quantities as input:

o	The probability that a given CPU is running a task
	of priority P for each priority.  The usual way to
	estimate this is based on per-thread CPU utilizations.

o	The expected duration of migrate-disable regions.

o	The expected wakeups per second for tasks of each priority.

With the usual disclaimers about cheezy mathematical approximations
of reality and all that.

Would this be useful, or am I still missing the point?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: On migrate_disable() and latencies
  2011-07-27 18:30         ` Paul E. McKenney
@ 2011-07-28  5:50           ` Yong Zhang
  2011-07-28  7:01             ` Thomas Gleixner
  2011-07-28  7:54           ` Nicholas Mc Guire
  1 sibling, 1 reply; 15+ messages in thread
From: Yong Zhang @ 2011-07-28  5:50 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Thomas Gleixner, LKML, linux-rt-users,
	Ingo Molnar, Carsten Emde, Clark Williams, Kumar Gala,
	Ralf Baechle, rostedt, Nicholas Mc Guire

On Wed, Jul 27, 2011 at 11:30:08AM -0700, Paul E. McKenney wrote:
> o	Tasks awakening outside of migrate-disable regions will pick
> 	the CPU running the lowest-priority task, whether or not this
> 	task is in migrate-disable state.  (At least I don't see
> 	anything in 3.0-rt3 that looks like a scheduling decision
> 	based on ->migrate_disable, perhaps due to blindness.)

I'm also confused here, seems we just disable migration for RT task.
migrate_disable()
{
	...
		if (p->sched_class->set_cpus_allowed)
			p->sched_class->set_cpus_allowed(p, mask);
		p->rt.nr_cpus_allowed =	cpumask_weight(mask);
	...
}

Shouldn't we also forbid migration on !RT task?

Thanks,
Yong

-- 
Only stand for myself

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: On migrate_disable() and latencies
  2011-07-28  5:50           ` Yong Zhang
@ 2011-07-28  7:01             ` Thomas Gleixner
  2011-07-28  7:10               ` Yong Zhang
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2011-07-28  7:01 UTC (permalink / raw)
  To: Yong Zhang
  Cc: Paul E. McKenney, Peter Zijlstra, LKML, linux-rt-users,
	Ingo Molnar, Carsten Emde, Clark Williams, Kumar Gala,
	Ralf Baechle, rostedt, Nicholas Mc Guire

On Thu, 28 Jul 2011, Yong Zhang wrote:

> On Wed, Jul 27, 2011 at 11:30:08AM -0700, Paul E. McKenney wrote:
> > o	Tasks awakening outside of migrate-disable regions will pick
> > 	the CPU running the lowest-priority task, whether or not this
> > 	task is in migrate-disable state.  (At least I don't see
> > 	anything in 3.0-rt3 that looks like a scheduling decision
> > 	based on ->migrate_disable, perhaps due to blindness.)
> 
> I'm also confused here, seems we just disable migration for RT task.
> migrate_disable()
> {
> 	...
> 		if (p->sched_class->set_cpus_allowed)
> 			p->sched_class->set_cpus_allowed(p, mask);
> 		p->rt.nr_cpus_allowed =	cpumask_weight(mask);
> 	...
> }
> 
> Shouldn't we also forbid migration on !RT task?

We do. Just RT is the only sched class which has a set_cpus_allowed()
callback implemented and want's an update to its rt.nr_cpus_allowed
field.

       if (!p->migrate_disable) {
                if (p->sched_class && p->sched_class->set_cpus_allowed)
                        p->sched_class->set_cpus_allowed(p, new_mask);
                p->rt.nr_cpus_allowed = cpumask_weight(new_mask);
        }

The general part is here:

       cpumask_copy(&p->cpus_allowed, new_mask);

And tsk_cpus_allowed() does:
{
        if (p->migrate_disable)
                return cpumask_of(task_cpu(p));

        return &p->cpus_allowed;
}

which is the relevant information to keep any task independent of it's
sched class pinned.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: On migrate_disable() and latencies
  2011-07-28  7:01             ` Thomas Gleixner
@ 2011-07-28  7:10               ` Yong Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Yong Zhang @ 2011-07-28  7:10 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paul E. McKenney, Peter Zijlstra, LKML, linux-rt-users,
	Ingo Molnar, Carsten Emde, Clark Williams, Kumar Gala,
	Ralf Baechle, rostedt, Nicholas Mc Guire

On Thu, Jul 28, 2011 at 09:01:23AM +0200, Thomas Gleixner wrote:
> We do. Just RT is the only sched class which has a set_cpus_allowed()
> callback implemented and want's an update to its rt.nr_cpus_allowed
> field.
> 
>        if (!p->migrate_disable) {
>                 if (p->sched_class && p->sched_class->set_cpus_allowed)
>                         p->sched_class->set_cpus_allowed(p, new_mask);
>                 p->rt.nr_cpus_allowed = cpumask_weight(new_mask);
>         }
> 
> The general part is here:
> 
>        cpumask_copy(&p->cpus_allowed, new_mask);
> 
> And tsk_cpus_allowed() does:
> {
>         if (p->migrate_disable)
>                 return cpumask_of(task_cpu(p));
> 
>         return &p->cpus_allowed;
> }

Hmmm, sched-use-task-allowed-accessor.patch is the preparation :)

Clear now, thanks Thomas.

-Yong

-- 
Only stand for myself

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: On migrate_disable() and latencies
  2011-07-27 18:30         ` Paul E. McKenney
  2011-07-28  5:50           ` Yong Zhang
@ 2011-07-28  7:54           ` Nicholas Mc Guire
  2011-07-28 12:05             ` Paul E. McKenney
  1 sibling, 1 reply; 15+ messages in thread
From: Nicholas Mc Guire @ 2011-07-28  7:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Thomas Gleixner, LKML, linux-rt-users,
	Ingo Molnar, Carsten Emde, Clark Williams, Kumar Gala,
	Ralf Baechle, rostedt

On Wed, 27 Jul 2011, Paul E. McKenney wrote:

> On Wed, Jul 27, 2011 at 01:13:18PM +0200, Peter Zijlstra wrote:
> > On Mon, 2011-07-25 at 14:17 -0700, Paul E. McKenney wrote:
> > 
> > > > I suppose it is indeed. Even for the SoftRT case we need to make sure
> > > > the total utilization loss is indeed acceptable.
> > > 
> > > OK.  If you are doing strict priority, then everything below the highest
> > > priority is workload dependent. 
> > 
> > <snip throttling, that's a whole different thing>
> > 
> > >  The higher-priority
> > > tasks can absolutely starve the lower-priority ones, with or without
> > > the migrate-disable capability.
> > 
> > Sure, that's how FIFO works, but it also relies on the fact that once
> > your high priority task completes the lower priority task resumes.
> > 
> > The extension to SMP is that we run the m highest priority tasks on n
> > cpus ; where m <= n. Any loss in utilization (idle time in this
> > particular case, but irq/preemption/migration and cache overhead are
> > also time not spend on the actual workload.
> > 
> > Now the WCET folks are all about quantifying the needs of applications
> > and the utilization limits of the OS etc. And while for SoftRT you can
> > relax quite a few of the various bounds you still need to know them in
> > order relax them (der Hofrat likes to move from worst case to avg case
> > IIRC).
>

its not about worst case vs. average case its about using the distribution
rather than boundary values - boundary values are hard to correlate with
specific events.
 
> ;-)
> 
> > > Another way of looking at it is from the viewpoint of the additional
> > > priority-boost events.  If preemption is disabled, the low-priority task
> > > will execute through the preempt-disable region without context switching.
> > > In contrast, given a migration-disable region, the low-priority task
> > > might be preempted and then boosted.  (If I understand correctly, if some
> > > higher-priority task tries to enter the same type of migration-disable
> > > region, it will acquire the associated lock, thus priority-boosting the
> > > task that is already in that region.)
> > 
> > No, there is no boosting involved, migrate_disable() isn't intrinsically
> > tied to a lock or other PI construct. We might needs locks to keep some
> > of the per-cpu crap correct, but that again, is a whole different ball
> > game.
> > 
> > But even if it was, I don't think PI will help any for this, we still
> > need to complete the various migrate_disable() sections, see below.
> 
> OK, got it.  I think, anyway.  I was incorrectly (or at least unhelpfully)
> pulling in locks that might be needed to handle per-CPU variables.
> 
> > > One stupid-but-tractable way to model this is to have an interarrival
> > > rate for the various process priorities, and then calculate the odds of
> > > (1) a higher priority process arriving while the low-priority one is
> > > in a *-disable region and (2) that higher priority process needing to
> > > enter a conflicting *-disable region.  This would give you some measure
> > > of the added boosting load due to migration-disable as compared to
> > > preemption-disable.
> > > 
> > > Would this sort of result be useful?
> > 
> > Yes, such type of analysis can be used, and I guess we can measure
> > various variables related to that.
> 
> OK, good.
> 
> > > > My main worry with all this is that we have these insane long !preempt
> > > > regions in mainline that are now !migrate regions, and thus per all the
> > > > above we could be looking at a substantial utilization loss.
> > > > 
> > > > Alternatively we could all be missing something far more horrid, but
> > > > that might just be my paranoia talking.
> > > 
> > > Ah, good point -- if each migration-disable region is associated with
> > > a lock, then you -could- allow migration and gain better utilization
> > > at the expense of worse caching behavior.  Is that the concern?
> > 
> > I'm not seeing how that would be true, suppose you have this stack of 4
> > migrate_disable() sections and 3 idle cpus, no amount of boosting will
> > make the already running task at the top of the stack go any faster, and
> > it needs to complete the migrate_disable section before it can be
> > migrated, equally so for the rest, so you still need
> > 3*migrate-disable-period of time before all your cpus are busy again.
> > 
> > You can move another task to the top of the stack by boosting, but
> > you'll need 3 tasks to complete their resp migrate-disable section, it
> > doesn't matter which task, so boosting doesn't change anything.
> 
> OK, so let me see if I understand what you are looking to model.
> 
> o	There are no locks.
> 
> o	There are a finite number of tasks with varying priorities.
> 	(I would initially work with a single task per priority
> 	level, but IIRC it is not hard to make multiple tasks per
> 	priority work.  Not a fan of infinite numbers of priorities,
> 	though!)
> 
> o	There are multiple CPUs.
> 
> o	Once a task enters a migrate-disable region, it must remain
> 	on that CPU.  (I will initially model the migrate-disable region
> 	as consuming a fixed amount of CPU.  If I wanted to really wuss
> 	out, I would model it as consuming an exponentially distributed
> 	amount of CPU.)
> 
> o	Tasks awakening outside of migrate-disable regions will pick
> 	the CPU running the lowest-priority task, whether or not this
> 	task is in migrate-disable state.  (At least I don't see
> 	anything in 3.0-rt3 that looks like a scheduling decision
> 	based on ->migrate_disable, perhaps due to blindness.)
>

This might be a simple heuristics to minimize the probability of stacking
in the first place.
 
> o	For an example, if all CPUs except for one are running prio-99
> 	tasks, and the remaining CPU is running a prio-1 task in
> 	a migrate-disable region, if a prio-2 tasks awakens, it
> 	will preempt the prio-1 task.

all CPUs utilized so no utilization loss at all in that szenario 

> 
> 	On the other hand, if at least one of the CPUs was idle,
> 	the prio-2 task would have instead run on that idle CPU.

so what you need to add to the model is the probability of the transitional
event:

   * prio-2 task preempts prio-1 task because all CPUs are idle
   * atleast one CPU becomes idle while prio-1 task is blocked for migration
     due to migrate-disable + preemted by prio-2 task

only in this combination does the system suffer a utilization penalty.

> 
> o	The transition probabilities depend on the priority
> 	of the currently running migrate-disable CPU -- the higher
> 	that priority, the greater the chance that any preempting
> 	task will find some other CPU instead.
> 
> 	The recurrence times depend on the number of tasks stacked
> 	up in migrate-disable regions on that CPU.
> 
> If this all holds, it would be possible to compute the probability
> of a given migrate-disable region being preempted and if preempted,
> the expected duration of that preemption, given the following
> quantities as input:
> 
> o	The probability that a given CPU is running a task
> 	of priority P for each priority.  The usual way to
> 	estimate this is based on per-thread CPU utilizations.
> 
> o	The expected duration of migrate-disable regions.
> 
> o	The expected wakeups per second for tasks of each priority.
> 
> With the usual disclaimers about cheezy mathematical approximations
> of reality and all that.
> 
> Would this be useful, or am I still missing the point?
>

to get an estimation of the latency impact - but to get a estimate of the
impact on system utilization you would need to include the probability that a 
different CPU is idle in the system and would in principle allow running
one of the tasks that can'b be migrated. As I understood it, the initial 
questions was if migrate_disable has a relevant impact on system utilization
in multicore systems. For this question I guess two of the key parameters are

 * probability that migrate-disable stacking occures 
 * probability of a idle CPU transition while stacking persists

I guess the probability of an idle transition of a CPU is hard to model as it
is very profile specific.

hofrat

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: On migrate_disable() and latencies
  2011-07-28  7:54           ` Nicholas Mc Guire
@ 2011-07-28 12:05             ` Paul E. McKenney
  0 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2011-07-28 12:05 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: Peter Zijlstra, Thomas Gleixner, LKML, linux-rt-users,
	Ingo Molnar, Carsten Emde, Clark Williams, Kumar Gala,
	Ralf Baechle, rostedt

On Thu, Jul 28, 2011 at 09:54:37AM +0200, Nicholas Mc Guire wrote:
> On Wed, 27 Jul 2011, Paul E. McKenney wrote:
> 
> > On Wed, Jul 27, 2011 at 01:13:18PM +0200, Peter Zijlstra wrote:
> > > On Mon, 2011-07-25 at 14:17 -0700, Paul E. McKenney wrote:
> > > 
> > > > > I suppose it is indeed. Even for the SoftRT case we need to make sure
> > > > > the total utilization loss is indeed acceptable.
> > > > 
> > > > OK.  If you are doing strict priority, then everything below the highest
> > > > priority is workload dependent. 
> > > 
> > > <snip throttling, that's a whole different thing>
> > > 
> > > >  The higher-priority
> > > > tasks can absolutely starve the lower-priority ones, with or without
> > > > the migrate-disable capability.
> > > 
> > > Sure, that's how FIFO works, but it also relies on the fact that once
> > > your high priority task completes the lower priority task resumes.
> > > 
> > > The extension to SMP is that we run the m highest priority tasks on n
> > > cpus ; where m <= n. Any loss in utilization (idle time in this
> > > particular case, but irq/preemption/migration and cache overhead are
> > > also time not spend on the actual workload.
> > > 
> > > Now the WCET folks are all about quantifying the needs of applications
> > > and the utilization limits of the OS etc. And while for SoftRT you can
> > > relax quite a few of the various bounds you still need to know them in
> > > order relax them (der Hofrat likes to move from worst case to avg case
> > > IIRC).
> 
> its not about worst case vs. average case its about using the distribution
> rather than boundary values - boundary values are hard to correlate with
> specific events.
> 
> > ;-)
> > 
> > > > Another way of looking at it is from the viewpoint of the additional
> > > > priority-boost events.  If preemption is disabled, the low-priority task
> > > > will execute through the preempt-disable region without context switching.
> > > > In contrast, given a migration-disable region, the low-priority task
> > > > might be preempted and then boosted.  (If I understand correctly, if some
> > > > higher-priority task tries to enter the same type of migration-disable
> > > > region, it will acquire the associated lock, thus priority-boosting the
> > > > task that is already in that region.)
> > > 
> > > No, there is no boosting involved, migrate_disable() isn't intrinsically
> > > tied to a lock or other PI construct. We might needs locks to keep some
> > > of the per-cpu crap correct, but that again, is a whole different ball
> > > game.
> > > 
> > > But even if it was, I don't think PI will help any for this, we still
> > > need to complete the various migrate_disable() sections, see below.
> > 
> > OK, got it.  I think, anyway.  I was incorrectly (or at least unhelpfully)
> > pulling in locks that might be needed to handle per-CPU variables.
> > 
> > > > One stupid-but-tractable way to model this is to have an interarrival
> > > > rate for the various process priorities, and then calculate the odds of
> > > > (1) a higher priority process arriving while the low-priority one is
> > > > in a *-disable region and (2) that higher priority process needing to
> > > > enter a conflicting *-disable region.  This would give you some measure
> > > > of the added boosting load due to migration-disable as compared to
> > > > preemption-disable.
> > > > 
> > > > Would this sort of result be useful?
> > > 
> > > Yes, such type of analysis can be used, and I guess we can measure
> > > various variables related to that.
> > 
> > OK, good.
> > 
> > > > > My main worry with all this is that we have these insane long !preempt
> > > > > regions in mainline that are now !migrate regions, and thus per all the
> > > > > above we could be looking at a substantial utilization loss.
> > > > > 
> > > > > Alternatively we could all be missing something far more horrid, but
> > > > > that might just be my paranoia talking.
> > > > 
> > > > Ah, good point -- if each migration-disable region is associated with
> > > > a lock, then you -could- allow migration and gain better utilization
> > > > at the expense of worse caching behavior.  Is that the concern?
> > > 
> > > I'm not seeing how that would be true, suppose you have this stack of 4
> > > migrate_disable() sections and 3 idle cpus, no amount of boosting will
> > > make the already running task at the top of the stack go any faster, and
> > > it needs to complete the migrate_disable section before it can be
> > > migrated, equally so for the rest, so you still need
> > > 3*migrate-disable-period of time before all your cpus are busy again.
> > > 
> > > You can move another task to the top of the stack by boosting, but
> > > you'll need 3 tasks to complete their resp migrate-disable section, it
> > > doesn't matter which task, so boosting doesn't change anything.
> > 
> > OK, so let me see if I understand what you are looking to model.
> > 
> > o	There are no locks.
> > 
> > o	There are a finite number of tasks with varying priorities.
> > 	(I would initially work with a single task per priority
> > 	level, but IIRC it is not hard to make multiple tasks per
> > 	priority work.  Not a fan of infinite numbers of priorities,
> > 	though!)
> > 
> > o	There are multiple CPUs.
> > 
> > o	Once a task enters a migrate-disable region, it must remain
> > 	on that CPU.  (I will initially model the migrate-disable region
> > 	as consuming a fixed amount of CPU.  If I wanted to really wuss
> > 	out, I would model it as consuming an exponentially distributed
> > 	amount of CPU.)
> > 
> > o	Tasks awakening outside of migrate-disable regions will pick
> > 	the CPU running the lowest-priority task, whether or not this
> > 	task is in migrate-disable state.  (At least I don't see
> > 	anything in 3.0-rt3 that looks like a scheduling decision
> > 	based on ->migrate_disable, perhaps due to blindness.)
> 
> This might be a simple heuristics to minimize the probability of stacking
> in the first place.

Indeed, one heuristic would be to preferentially preempt a CPU without
any runnable migrate-disable tasks, for example, use migrate-disable
as another bit in the priority comparison -- if two CPUs at a given
priority are available, preempt the one without the migrate-disable
runnable task.

But Peter would probably want to know how effective that would be.

> > o	For an example, if all CPUs except for one are running prio-99
> > 	tasks, and the remaining CPU is running a prio-1 task in
> > 	a migrate-disable region, if a prio-2 tasks awakens, it
> > 	will preempt the prio-1 task.
> 
> all CPUs utilized so no utilization loss at all in that szenario 
> 
> > 	On the other hand, if at least one of the CPUs was idle,
> > 	the prio-2 task would have instead run on that idle CPU.
> 
> so what you need to add to the model is the probability of the transitional
> event:
> 
>    * prio-2 task preempts prio-1 task because all CPUs are idle

s/idle/busy/, correct?

>    * atleast one CPU becomes idle while prio-1 task is blocked for migration
>      due to migrate-disable + preemted by prio-2 task
> 
> only in this combination does the system suffer a utilization penalty.

Yep!  Or at least has its priority drop to below prio-1, for example,
starts running a non-realtime task, but yes.

> > o	The transition probabilities depend on the priority
> > 	of the currently running migrate-disable CPU -- the higher
> > 	that priority, the greater the chance that any preempting
> > 	task will find some other CPU instead.
> > 
> > 	The recurrence times depend on the number of tasks stacked
> > 	up in migrate-disable regions on that CPU.
> > 
> > If this all holds, it would be possible to compute the probability
> > of a given migrate-disable region being preempted and if preempted,
> > the expected duration of that preemption, given the following
> > quantities as input:
> > 
> > o	The probability that a given CPU is running a task
> > 	of priority P for each priority.  The usual way to
> > 	estimate this is based on per-thread CPU utilizations.
> > 
> > o	The expected duration of migrate-disable regions.
> > 
> > o	The expected wakeups per second for tasks of each priority.
> > 
> > With the usual disclaimers about cheezy mathematical approximations
> > of reality and all that.
> > 
> > Would this be useful, or am I still missing the point?
> 
> to get an estimation of the latency impact - but to get a estimate of the
> impact on system utilization you would need to include the probability that a 
> different CPU is idle in the system and would in principle allow running
> one of the tasks that can'b be migrated. As I understood it, the initial 
> questions was if migrate_disable has a relevant impact on system utilization
> in multicore systems. For this question I guess two of the key parameters are
> 
>  * probability that migrate-disable stacking occures 
>  * probability of a idle CPU transition while stacking persists

Or at least the probability of a CPU transitioning to a lower
priority than one of the migrate-disable tasks.

> I guess the probability of an idle transition of a CPU is hard to model as it
> is very profile specific.

It is profile specific, but the transition probabilities could be
collected.

One approach is to extend the model all of the CPUs.  If we ignore
non-RT tasks, and if there is one task per priority per CPU, then the
data required for tasks is the transition probabilities between blocked,
running, and migrate-disable modes.  The state of a given task must
also include the CPU if in migrate-disable mode.  Then the "badness" of
the migrate-disable scheme would be the probability of being in states
where a migrate-disable task on one processor was preempted by
migrate-disable tasks on that same processor, and where some other
processor was running a lower-priority task.

This does do a bit of combinatorial explosion, but is tractable for small
numbers of CPUs and tasks.  For example, four tasks on two CPUs gives
256 states:  there are four tasks, and each task can be idle, running,
or running in migrate-disable on either of the two CPUs, so 4^4 states.
This is more than I want to draw, but could be handled automatically.
Debugging and validation of the model would be a bit of a pain, of course.
And approximation of transition probabilities with an exponential 
distribution seems warranted.

For N CPUs and T tasks with P possible priorities (so that T
is equal to N*P), the number of states is given by:

	S = (N+2)^(N*P)

I could expect to calculate probabilities only for this tiniest model,
and maybe 3 CPUs with two priorities, because that involves an SxS matrix.
Unless there is some good sparse-matrix software out there.

But this gives a measure of under-utilization, along with failure to run
a high-priority task (due to its being migrate-disable) while a CPU is
running a lower-priority task.

I bet it is possible to make use of expected transition times: if in a
"badness" state, how long to get to a non-"badness" state?  This should
require memory size proportional to the number of states rather than to
the square of the number of states.  This would permit looking at
somewhat larger (though still quite small) scenarios.

Thoughts?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-07-28 12:06 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-22 10:19 On migrate_disable() and latencies Peter Zijlstra
2011-07-22 10:49 ` Peter Zijlstra
2011-07-22 14:57   ` Peter Zijlstra
2011-07-22 17:45 ` Nicholas Mc Guire
2011-07-25  8:21   ` Peter Zijlstra
2011-07-23  0:39 ` Paul E. McKenney
2011-07-25  8:30   ` Peter Zijlstra
2011-07-25 21:17     ` Paul E. McKenney
2011-07-27 11:13       ` Peter Zijlstra
2011-07-27 18:30         ` Paul E. McKenney
2011-07-28  5:50           ` Yong Zhang
2011-07-28  7:01             ` Thomas Gleixner
2011-07-28  7:10               ` Yong Zhang
2011-07-28  7:54           ` Nicholas Mc Guire
2011-07-28 12:05             ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.