linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Vaidyanathan Srinivasan <svaidy@linux.ibm.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Frederic Weisbecker <fweisbec@gmail.com>
Subject: Re: [PATCH] nohz: nohz idle balancing per node
Date: Thu, 1 Jul 2021 12:18:18 +0200	[thread overview]
Message-ID: <YN2Wav1CSVq+6cS+@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20210701055323.2199175-1-npiggin@gmail.com>

On Thu, Jul 01, 2021 at 03:53:23PM +1000, Nicholas Piggin wrote:
> Currently a single nohz idle CPU is designated to perform balancing on
> behalf of all other nohz idle CPUs in the system. Implement a per node
> nohz balancer to minimize cross-node memory accesses and runqueue lock
> acquisitions.
> 
> On a 4 node system, this improves performance by 9.3% on a 'pgbench -N'
> with 32 clients/jobs (which is about where throughput maxes out due to
> IO and contention in postgres).

Hmm, Suresh tried something like this around 2010 and then we ran into
trouble that when once node went completely idle and another node was
fully busy, the completely idle node would not run ILB and the node
would forever stay idle.

I've not gone through the code to see if that could still happen -- lots
has changed since, but it is something to consider.

Some further nits below..

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fb469b26b00a..832f8673bba1 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5722,13 +5722,27 @@ DEFINE_PER_CPU(cpumask_var_t, select_idle_mask);
>  
>  #ifdef CONFIG_NO_HZ_COMMON
>  
> -static struct {
> +struct nohz {
>  	cpumask_var_t idle_cpus_mask;
>  	atomic_t nr_cpus;
>  	int has_blocked;		/* Idle CPUS has blocked load */
>  	unsigned long next_balance;     /* in jiffy units */
>  	unsigned long next_blocked;	/* Next update of blocked load in jiffies */
> -} nohz ____cacheline_aligned;
> +} ____cacheline_aligned;
> +
> +static struct nohz **nohz_nodes __ro_after_init;
> +
> +static struct nohz *get_nohz(void)
> +{
> +#ifdef CONFIG_CPU_ISOLATION
> +	/*
> +	 * May not have a house keeping CPU per node, do global idle balancing.
> +	 */
> +	if (static_branch_unlikely(&housekeeping_overridden))
> +		return nohz_nodes[0];
> +#endif
> +	return nohz_nodes[numa_node_id()];
> +}

*urgh* I'm not a fan of that isolation/hk behaviour. Even when the HK
mask allows for a CPU per node, this code won't DTRT. Do we want a
KERN_INFO message that performance will suffer here?

>  #endif /* CONFIG_NO_HZ_COMMON */
>  
> @@ -10291,14 +10305,14 @@ static void nohz_balancer_kick(struct rq *rq)
>  	 * None are in tickless mode and hence no need for NOHZ idle load
>  	 * balancing.
>  	 */
> -	if (likely(!atomic_read(&nohz.nr_cpus)))
> +	if (likely(!atomic_read(&get_nohz()->nr_cpus)))
>  		return;
>  
> -	if (READ_ONCE(nohz.has_blocked) &&
> -	    time_after(now, READ_ONCE(nohz.next_blocked)))
> +	if (READ_ONCE(get_nohz()->has_blocked) &&
> +	    time_after(now, READ_ONCE(get_nohz()->next_blocked)))
>  		flags = NOHZ_STATS_KICK;
>  
> -	if (time_before(now, nohz.next_balance))
> +	if (time_before(now, get_nohz()->next_balance))
>  		goto out;
>  
>  	if (rq->nr_running >= 2) {

Also, stuff like the above, will result in horrific code-gen, because
it cannot CSE get_nohz().

  reply	other threads:[~2021-07-01 10:19 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-01  5:53 [PATCH] nohz: nohz idle balancing per node Nicholas Piggin
2021-07-01 10:18 ` Peter Zijlstra [this message]
2021-07-01 13:11   ` Mel Gorman
2021-07-01 23:33     ` Nicholas Piggin
2021-07-02  8:49       ` Mel Gorman
2021-07-01 23:20   ` Nicholas Piggin
2021-07-01 16:41 ` kernel test robot
2021-07-01 16:48 ` kernel test robot
2021-07-07  7:50 ` Vincent Guittot
2021-07-12 13:57   ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YN2Wav1CSVq+6cS+@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=fweisbec@gmail.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=svaidy@linux.ibm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).