All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Clark Williams <williams@redhat.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>
Subject: Re: [PATCH] sched/rt: Add comments describing the RT IPI pull method
Date: Tue, 28 Feb 2017 15:47:53 -0500	[thread overview]
Message-ID: <20170228154753.1543d421@gandalf.local.home> (raw)
In-Reply-To: <20170220111235.GO6500@twins.programming.kicks-ass.net>

On Mon, 20 Feb 2017 12:12:35 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

Hi Peter,

I'm finally getting around to updating this patch.
 
> > +/*
> > + * When a high priority task schedules out from a CPU and a lower priority
> > + * task is scheduled in, a check is made to see if there's any RT tasks
> > + * on other CPUs that are waiting to run because a higher priority RT task
> > + * is currently running on its CPU. In this case, the CPU with multiple RT
> > + * tasks queued on it (overloaded) needs to be notified that a CPU has opened
> > +*  up that may be able to run its sleeping RT task.  
> 
> Whitespace hickup

Not sure how these got in. I don't see it in my code, but they are
definitely in my patch. Maybe I fixed it before sending it again :-/

> 
> Also, incorrect use of sleeping here, generally in the scheduler that
> means blocked, not on runqueue, as in requires a wakeup.

I updated this.

> 
> > + *
> > + * Instead of taking the rq lock of the overloaded CPU, which could cause
> > + * a huge contention on boxes with several cores,  
> 
> It is not immediately obvious to the casual reader _why_ this is so.

I added some background to the original issue that this patch solved.

> 
>                                                      if lots of CPUs have just
> > + * scheduled a lower priority task and there's only one overloaded CPU,
> > + * these CPUs that are now running a lower priority task will simply send an
> > + * IPI to the overloaded CPU with multiple RT tasks queued on it.
> > + *
> > + * Instead of sending an IPI to all CPUs that have overloaded RT tasks,
> > + * only a single IPI is sent to one CPU. That one will try to push off its
> > + * overloaded task and then send an IPI to the next CPU that has overloaded RT
> > + * tasks. This stops when all CPUs with overloaded RT tasks have completed.
> > + * Just because a CPU may have pushed off its own overloaded RT task does
> > + * not mean it should stop sending the IPI around to other overloaded CPUs.
> > + * There may be another RT task waiting to run on one of those CPUs that are
> > + * of higher priority than the one that was just pushed.  
> 
> My brain had a bit of bother following the verbosity there. Maybe cut
> this into parts;
> 
>  - broadcast IPIs are bad, $reason.
> 
>  - Therefore send a single IPI that visits one after another.
> 
>  - termination condition for visitor..

I rewrote the above, hopefully it will be better.

> 
> > + * An optimization that could possibly be made is to make a CPU array similar
> > + * to the cpupri array mask of all running RT tasks, but for the overloaded
> > + * case, then the IPI could be sent to only the CPU with the highest priority
> > + * RT task waiting, and that CPU could send off further IPIs to the CPU with
> > + * the next highest waiting task. Since the overloaded case is much less likely
> > + * to happen, the complexity of this implementation may not be worth it.
> > + * Instead, just send an IPI around to all overloaded CPUs.  
> 
> That's a lot of maybe..

Yeah, I'm up in the air about this. Not sure if it even belongs. I put
it in as a "we thought about it, but decided against it, if anyone
would like to try, feel free". But maybe it's best just to nuke this
paragraph.


> 
> > + *
> > + * The rq->rt.push_flags holds the status of the IPI that is going around.
> > + * A run queue can only send out a single IPI at a time. The possible flags
> > + * for rq->rt.push_flags are:
> > + *
> > + *    (None or zero):		No IPI is going around for the current rq
> > + *    RT_PUSH_IPI_EXECUTING:	An IPI for the rq is being passed around
> > + *    RT_PUSH_IPI_RESTART:	The priority of the running task for the rq
> > + *				  has changed, and the IPI should restart
> > + *				  circulating the overloaded CPUs again.  
> 
> whitespace messup

Again, strange.

> 
> > + *
> > + * rq->rt.push_cpu contains the CPU that is being sent the IPI. It is updated
> > + * before sending to the next CPU.
> > + *
> > + * When an rq schedules a lower priority task than what was currently
> > + * running, the next CPU with overloaded RT tasks is examined first.
> > + * That is, if CPU 1 and 5 are overloaded, and CPU 3 schedules a lower
> > + * priority task, it will send an IPI first to CPU 5, then CPU 5 will
> > + * send to CPU 1 if it is still overloaded.   
> 
> + $reason for this

Added a reason.

> 
> 
>                                                CPU 1 will clear the
> > + * rq->rt.push_flags if RT_PUSH_IPI_RESTART is not set.
> > + *
> > + * The first CPU to notice IPI_RESTART is set, will clear that flag and then
> > + * send an IPI to the next overloaded CPU after the rq->cpu and not the next
> > + * CPU after push_cpu. That is, if CPU 1, 4 and 5 are overloaded when CPU 3
> > + * schedules a lower priority task, and the IPI_RESTART gets set while the
> > + * handling is being done on CPU 5, it will clear the flag and send it back to
> > + * CPU 4 instead of CPU 1.
> > + *
> > + * Note, the above logic can be disabled by turning off the sched_feature
> > + *  RT_PUSH_IPI. Then the rq lock of the overloaded CPU will simply be
> > + *  taken by the CPU requesting a pull and the waiting RT task will be pulled
> > + *  by that CPU. This may be fine for machines with few CPUs.  
> 
> Also whitespace fail.
> 
> In general, I feel this thing could do with a dose of _why_, we can get
> details of the how from the code.

I'll send out a v2. I kept in that one paragraph with a lot of
"maybes", think I should nuke it?

-- Steve

      reply	other threads:[~2017-02-28 20:57 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-17 21:20 [PATCH] sched/rt: Add comments describing the RT IPI pull method Steven Rostedt (VMware)
2017-02-20 11:12 ` Peter Zijlstra
2017-02-28 20:47   ` Steven Rostedt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170228154753.1543d421@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=akpm@linux-foundation.org \
    --cc=bristot@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.