linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Rik van Riel <riel@surriel.com>, Yi Wang <wang.yi59@zte.com.cn>,
	zhong.weidong@zte.com.cn, Yi Liu <liu.yi24@zte.com.cn>,
	Frederic Weisbecker <frederic@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH v2] sched/core: Don't mix isolcpus and housekeeping CPUs
Date: Thu, 25 Oct 2018 23:00:58 +0530	[thread overview]
Message-ID: <20181025173058.GD18466@linux.vnet.ibm.com> (raw)
In-Reply-To: <20181025000707.GR3109@worktop.c.hoisthospitality.com>

> 
> That's completely broken. Nothing in the numa balancing path uses that
> variable and afaict preemption is actually enabled where that's used, so
> using that per-cpu variable at all is broken.
> 

I can demonstrate that even without numa balancing, there are
inconsistent behaviour with isolcpus on.

> 
> Both of you are fixing symptoms, not the cause.
> 

Okay.

> But it doesn't solve the problem.
> 
> You can create multiple partitions with cpusets but still have an
> unbound task in the root cgroup. That would suffer the exact same
> problems.
> 
> Thing is, load-balancing, of any kind, should respect sched_domains, and
> currently numa balancing barely looks at it.

Agreed that we should have looked at sched_domains. However I still believe
we can't have task->cpus_allowed with a mix of isolcpus and non-isolcpus.
won't it lead to inconsistent behaviour?

> 
> The proposed patch puts the minimal constraints on the numa balancer to
> respect sched_domains; but doesn't yet correctly deal with hotplug.

I was also thinking about hotplug. Also your proposed patch and even my
proposed patch don't seem to work well with the below scenario.

# cat /sys/devices/system/cpu/possible
0-31
# cat /sys/devices/system/cpu/isolated
1,5,9,13
# cat hist.sh
echo 0 > /proc/sys/kernel/numa_balancing
cd /sys/fs/cgroup/cpuset
mkdir -p student
cp cpuset.mems student/
cd student
echo "0-31" > cpuset.cpus
echo $$ > cgroup.procs 
echo "1-8" > cpuset.cpus
/home/srikar/work/ebizzy-0.3/ebizzy -S 1000 &
PID=$!
sleep 10
pidstat -p $! -t |tail -n +3 |head -n 10
pidstat -p $$ -t |tail -n +3
pkill ebizzy
#
# ./hist.sh
10:35:21  IST   UID      TGID       TID    %usr %system  %guest    %CPU   CPU  Command
10:35:21  IST     0      2645         -    8.70    0.01    0.00    8.71     1  ebizzy
10:35:21  IST     0         -      2645    0.01    0.00    0.00    0.01     1  |__ebizzy
10:35:21  IST     0         -      2647    0.14    0.00    0.00    0.14     1  |__ebizzy
10:35:21  IST     0         -      2648    0.13    0.00    0.00    0.13     1  |__ebizzy
10:35:21  IST     0         -      2649    0.13    0.00    0.00    0.13     1  |__ebizzy
10:35:21  IST     0         -      2650    0.13    0.00    0.00    0.13     1  |__ebizzy
10:35:21  IST     0         -      2651    0.13    0.00    0.00    0.13     1  |__ebizzy
10:35:21  IST     0         -      2652    0.13    0.00    0.00    0.13     1  |__ebizzy
10:35:21  IST     0         -      2653    0.13    0.00    0.00    0.13     1  |__ebizzy
10:35:23  IST   UID      TGID       TID    %usr %system  %guest    %CPU   CPU  Command
10:35:23  IST     0      2642         -    0.00    0.00    0.00    0.00     1  hist.sh
10:35:23  IST     0         -      2642    0.00    0.00    0.00    0.00     1  |__hist.sh
#

Note all the ebizzy and bash task that started it are on cpu 1. This happens
if the cpuset starts with an isolcpu, then all tasks in that cpuset might
only run in that cpu.  With a smaller cpuset, ebizzy always runs on cpu 1.
However, if I increase the cpuset, the chances of ebizzy spreading increases
but not always.

I only tried this on a powerpc kvm guest. I dont think there is anything to do
with arch/guest/host

I have something that seems to help out. Will post soon.

> isolcpus is just one case that goes wrong.
Similar to isolcpus, are there other cases that we need to worry about?

-- 
Thanks and Regards
Srikar Dronamraju


  reply	other threads:[~2018-10-25 17:31 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-24  3:02 [PATCH v2] sched/core: Don't mix isolcpus and housekeeping CPUs Srikar Dronamraju
2018-10-24  8:56 ` Mel Gorman
2018-10-24  9:46   ` Srikar Dronamraju
2018-10-24 10:15     ` Peter Zijlstra
2018-10-24 10:41       ` Srikar Dronamraju
2018-10-24 11:21         ` Mel Gorman
2018-10-24 10:31     ` Mel Gorman
2018-10-24 10:03 ` Peter Zijlstra
2018-10-24 10:30   ` Srikar Dronamraju
2018-10-25  0:07     ` Peter Zijlstra
2018-10-25 17:30       ` Srikar Dronamraju [this message]
2018-10-26  8:41         ` Peter Zijlstra
2018-10-26  9:30           ` Srikar Dronamraju
2018-10-25 18:23       ` Srikar Dronamraju
2018-10-26  8:42         ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181025173058.GD18466@linux.vnet.ibm.com \
    --to=srikar@linux.vnet.ibm.com \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liu.yi24@zte.com.cn \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=tglx@linutronix.de \
    --cc=wang.yi59@zte.com.cn \
    --cc=zhong.weidong@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).