Re: [PATCH v2] sched: rt: Make RT capacity aware

From: Pavan Kondeti <pkondeti@codeaurora.org>
To: Qais Yousef <qais.yousef@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] sched: rt: Make RT capacity aware
Date: Mon, 3 Feb 2020 11:02:35 +0530	[thread overview]
Message-ID: <20200203053235.GE27398@codeaurora.org> (raw)
In-Reply-To: <CAEU1=PnYryM26F-tNAT0JVUoFcygRgE374JiBeJPQeTEoZpANg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 7397 bytes --]

Hi Qais,

On Sat, Feb 01, 2020 at 07:08:34AM +0530, Pavan Kondeti wrote:
> Hi Qais,
> 
> On Fri, Jan 31, 2020 at 9:04 PM Qais Yousef <qais.yousef@arm.com> wrote:
> 
> > Hi Pavan
> >
> > On 01/31/20 15:36, Pavan Kondeti wrote:
> > > Hi Qais,
> > >
> > > On Wed, Oct 09, 2019 at 11:46:11AM +0100, Qais Yousef wrote:
> >
> > [...]
> >
> > > >
> > > > For RT we don't have a per task utilization signal and we lack any
> > > > information in general about what performance requirement the RT task
> > > > needs. But with the introduction of uclamp, RT tasks can now control
> > > > that by setting uclamp_min to guarantee a minimum performance point.
> >
> > [...]
> >
> > > > ---
> > > >
> > > > Changes in v2:
> > > >     - Use cpupri_find() to check the fitness of the task instead of
> > > >       sprinkling find_lowest_rq() with several checks of
> > > >       rt_task_fits_capacity().
> > > >
> > > >       The selected implementation opted to pass the fitness function
> > as an
> > > >       argument rather than call rt_task_fits_capacity() capacity which
> > is
> > > >       a cleaner to keep the logical separation of the 2 modules; but it
> > > >       means the compiler has less room to optimize
> > rt_task_fits_capacity()
> > > >       out when it's a constant value.
> > > >
> > > > The logic is not perfect. For example if a 'small' task is occupying a
> > big CPU
> > > > and another big task wakes up; we won't force migrate the small task
> > to clear
> > > > the big cpu for the big task that woke up.
> > > >
> > > > IOW, the logic is best effort and can't give hard guarantees. But
> > improves the
> > > > current situation where a task can randomly end up on any CPU
> > regardless of
> > > > what it needs. ie: without this patch an RT task can wake up on a big
> > or small
> > > > CPU, but with this it will always wake up on a big CPU (assuming the
> > big CPUs
> > > > aren't overloaded) - hence provide a consistent performance.
> >
> > [...]
> >
> > > I understand that RT tasks run on BIG cores by default when uclamp is
> > enabled.
> > > Can you tell what happens when we have more runnable RT tasks than the
> > BIG
> > > CPUs? Do they get packed on the BIG CPUs or eventually silver CPUs pull
> > those
> > > tasks? Since rt_task_fits_capacity() is considered during wakeup, push
> > and
> > > pull, the tasks may get packed on BIG forever. Is my understanding
> > correct?
> >
> > I left up the relevant part from the commit message and my 'cover-letter'
> > above
> > that should contain answers to your question.
> >
> > In short, the logic is best effort and isn't a hard guarantee. When the
> > system
> > is overloaded we'll still spread, and a task that needs a big core might
> > end up
> > on a little one. But AFAIU with RT, if you really want guarantees you need
> > to
> > do some planning otherwise there are no guarantees in general that your
> > task
> > will get what it needs.
> >
> > But I understand your question is for the general purpose case. I've
> > hacked my
> > notebook to run a few tests for you
> >
> >
> > https://gist.github.com/qais-yousef/cfe7487e3b43c3c06a152da31ae09101
> >
> > Look at the diagrams in "Test {1, 2, 3} Results". I spawned 6 tasks which
> > match
> > the 6 cores on the Juno I ran on. Based on Linus' master from a couple of
> > days.
> >
> > Note on Juno cores 1 and 2 are the big cors. 'b_*' and 'l_*' are the task
> > names
> > which are remnants from my previous testing where I spawned different
> > numbers
> > of big and small tasks.
> >
> > I repeat the same tests 3 times to demonstrate the repeatability. The logic
> > causes 2 tasks to run on a big CPU, but there's spreading. IMO on a general
> > purpose system this is a good behavior. On a real time system that needs
> > better
> > guarantee then there's no alternative to doing proper RT planning.
> >
> > In the last test I just spawn 2 tasks which end up on the right CPUs, 1
> > and 2.
> > On system like Android my observations has been that there are very little
> > concurrent RT tasks active at the same time. So if there are some tasks in
> > the
> > system that do want to be on the big CPU, they most likely to get that
> > guarantee. Without this patch what you get is completely random.
> >
> >
> Thanks for the results. I see that tasks are indeed spreading on to silver.
> However it is not
> clear to me from the code how tasks would get spread. Because cpupri_find()
> never returns
> little CPU in the lowest_mask because RT task does not fit on little CPU.
> So from wake up
> path, we place the task on the previous CPU or BIG CPU. The push logic also
> has the
> RT capacity checks, so an overloaded BIG CPU may not push tasks to an idle
> (RT free) little CPU.
> 
> 

I pulled your patch on top of v5.5 and run the below rt-app test on SDM845
platform. We expect 5 RT tasks to spread on different CPUs which was happening
without the patch. With the patch, I see one of the task got woken up on a CPU
which is running another RT task.

{
	"tasks" : {
		"big-task" : {
			"instance" : 5,
			"loop" : 10,
			"sleep" : 100000,
			"runtime" : 100000,
		},
	},
	"global" : {
		"duration" : -1,
		"calibration" : 720,
		"default_policy" : "SCHED_FIFO",
		"pi_enabled" : false,
		"lock_pages" : false,
		"logdir" : "/",
		"log_basename" : "rt-app2",
		"ftrace" : false,
		"gnuplot" : false
	}
}

Full trace is attached. Copy/pasting the snippet where it shows packing is
happening. The custom trace_printk are added in cpupri_find() before calling
fitness_fn(). As you can see pid=535 is woken on CPU#7 where pid=538 RT task
is runnning. Right after waking, the push is tried but it did not work either.

This is not a serious problem for us since we don't set RT tasks
uclamp.min=1024 . However, it changed the behavior and might introduce latency
for RT tasks on b.L platforms running the upstream kernel as is.

        big-task-538   [007] d.h.   403.401633: irq_handler_entry: irq=3 name=arch_timer
        big-task-538   [007] d.h2   403.401633: sched_waking: comm=big-task pid=535 prio=89 target_cpu=007
        big-task-538   [007] d.h2   403.401635: cpupri_find: before task=big-task-535 lowest_mask=f
        big-task-538   [007] d.h2   403.401636: cpupri_find: after task=big-task-535 lowest_mask=0
        big-task-538   [007] d.h2   403.401637: cpupri_find: it comes here
        big-task-538   [007] d.h2   403.401638: find_lowest_rq: task=big-task-535 ret=0 lowest_mask=0
        big-task-538   [007] d.h3   403.401640: sched_wakeup: comm=big-task pid=535 prio=89 target_cpu=007
        big-task-538   [007] d.h3   403.401641: cpupri_find: before task=big-task-535 lowest_mask=f
        big-task-538   [007] d.h3   403.401642: cpupri_find: after task=big-task-535 lowest_mask=0
        big-task-538   [007] d.h3   403.401642: cpupri_find: it comes here
        big-task-538   [007] d.h3   403.401643: find_lowest_rq: task=big-task-535 ret=0 lowest_mask=0
        big-task-538   [007] d.h.   403.401644: irq_handler_exit: irq=3 ret=handled
        big-task-538   [007] d..2   403.402413: sched_switch: prev_comm=big-task prev_pid=538 prev_prio=89 prev_state=S ==> next_comm=big-task next_pid=535 next_prio=89

Thanks,
Pavan
-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

[-- Attachment #2: trace.tar.gz --]
[-- Type: application/octet-stream, Size: 31906 bytes --]