linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Galbraith <umgwanakikbuti@gmail.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Christoph Lameter <cl@linux.com>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle)
Date: Wed, 27 Jan 2016 18:03:54 +0100	[thread overview]
Message-ID: <1453914234.3517.32.camel@gmail.com> (raw)
In-Reply-To: <20160127164825.GF13951@dhcp22.suse.cz>

On Wed, 2016-01-27 at 17:48 +0100, Michal Hocko wrote:
> On Sat 23-01-16 17:21:55, Mike Galbraith wrote:
> > Hi Christoph,
> > 
> > While you're fixing that commit up, can you perhaps find a better
> > home
> > for quiet_vmstat()?  It not only munches cycles when switching
> > cross
> > -core mightily, for -rt it injects a sleeping lock into the idle
> > task.
> > 
> >     12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
> >      4.75%  [kernel]       [k] __schedule                  
> >      4.70%  [kernel]       [k] mutex_unlock                
> >      3.14%  [kernel]       [k] __switch_to                 
> 
> What about the following fix?

I haven't done _extensive_ testing, but 4.5-rc1-rt (with NO_HZ_FULL
enabling hacks restored, as it's otherwise a dead .config item) is
gripe free whether nohz_full CPUs are in use or not, and overhead went
away in it and desktop configured master kernel (PREEMPT_VOLUNTARY).

> ---
> From c74a04c4fdfe1fa67933bb1ac83d3de0532aaab2 Mon Sep 17 00:00:00
> 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Wed, 27 Jan 2016 17:24:22 +0100
> Subject: [PATCH] mm, vmstat: make quiet_vmstat lighter
> 
> Mike has reported a considerable overhead of refresh_cpu_vm_stats
> from
> the idle entry during pipe test:
>     12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
>      4.75%  [kernel]       [k] __schedule
>      4.70%  [kernel]       [k] mutex_unlock
>      3.14%  [kernel]       [k] __switch_to
> 
> This is caused by 0eb77e988032 ("vmstat: make vmstat_updater
> deferrable
> again and shut down on idle") which has placed quiet_vmstat into
> cpu_idle_loop. The main reason here seems to be that the idle entry
> has
> to get over all zones and perform atomic operations for each vmstat
> entry even though there might be no per cpu diffs. This is a
> pointless
> overhead for _each_ idle entry.
> 
> Make sure that quiet_vmstat is as light as possible.
> 
>  First of all it doesn't make any sense to do any local sync if the
> current cpu is already set in oncpu_stat_off because vmstat_update
> puts
> itself there only if there is nothing to do.
> 
> Then we can check need_update which should be a cheap way to check
> for
> potential per-cpu diffs and only then do refresh_cpu_vm_stats.
> 
> The original patch also did cancel_delayed_work which we are not
> doing
> here. There are two reasons for that. Firstly cancel_delayed_work
> from
> idle context will blow up on RT kernels (reported by Mike):
> [    2.279582] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7
> [    2.280444] Hardware name: MEDION MS-7848/MS-7848, BIOS
> M7848W08.20C 09/23/2013
> [    2.281316]  ffff88040b00d640 ffff88040b01fe10 ffffffff812d20e2
> 0000000000000000
> [    2.282202]  ffff88040b01fe30 ffffffff81081095 ffff88041ec4cee0
> ffff88041ec501e0
> [    2.283073]  ffff88040b01fe48 ffffffff815ff910 ffff88041ec4cee0
> ffff88040b01fe88
> [    2.283941] Call Trace:
> [    2.284797]  [<ffffffff812d20e2>] dump_stack+0x49/0x67
> [    2.285658]  [<ffffffff81081095>] ___might_sleep+0xf5/0x180
> [    2.286521]  [<ffffffff815ff910>] rt_spin_lock+0x20/0x50
> [    2.287382]  [<ffffffff81075919>] try_to_grab_pending+0x69/0x240
> [    2.288239]  [<ffffffff81075b16>] cancel_delayed_work+0x26/0xe0
> [    2.289094]  [<ffffffff8115ec05>] quiet_vmstat+0x75/0xa0
> [    2.289949]  [<ffffffff8109ab38>] cpu_idle_loop+0x38/0x3e0
> [    2.290800]  [<ffffffff8109aef3>] cpu_startup_entry+0x13/0x20
> [    2.291647]  [<ffffffff81036164>] start_secondary+0x114/0x140
> 
> And secondly, even on !RT kernels it might add some non trivial
> overhead
> which is not necessary. Even if the vmstat worker wakes up and
> preempts
> idle then it will be most likely a single shot noop because the stats
> were already synced and so it would end up on the oncpu_stat_off
> anyway.
> We just need to teach both vmstat_shepherd and vmstat_update to stop
> scheduling the worker if there is nothing to do.
> 
> Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/vmstat.c | 60 ++++++++++++++++++++++++++++++++++++++++-----------
> ---------
>  1 file changed, 40 insertions(+), 20 deletions(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 40b2c74ddf16..7747f85537b6 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1396,10 +1396,15 @@ static void vmstat_update(struct work_struct
> *w)
>  		 * Counters were updated so we expect more updates
>  		 * to occur in the future. Keep on running the
>  		 * update worker thread.
> +		 * If we were marked on cpu_stat_off clear the flag
> +		 * so that vmstat_shepherd doesn't schedule us
> again.
>  		 */
> -		queue_delayed_work_on(smp_processor_id(), vmstat_wq,
> -			this_cpu_ptr(&vmstat_work),
> -			round_jiffies_relative(sysctl_stat_interval)
> );
> +		if (!cpumask_test_and_clear_cpu(smp_processor_id(),
> +						cpu_stat_off)) {
> +			queue_delayed_work_on(smp_processor_id(),
> vmstat_wq,
> +				this_cpu_ptr(&vmstat_work),
> +				round_jiffies_relative(sysctl_stat_i
> nterval));
> +		}
>  	} else {
>  		/*
>  		 * We did not update any counters so the app may be
> in
> @@ -1417,18 +1422,6 @@ static void vmstat_update(struct work_struct
> *w)
>   * until the diffs stay at zero. The function is used by NOHZ and
> can only be
>   * invoked when tick processing is not active.
>   */
> -void quiet_vmstat(void)
> -{
> -	if (system_state != SYSTEM_RUNNING)
> -		return;
> -
> -	do {
> -		if (!cpumask_test_and_set_cpu(smp_processor_id(),
> cpu_stat_off))
> -			cancel_delayed_work(this_cpu_ptr(&vmstat_wor
> k));
> -
> -	} while (refresh_cpu_vm_stats(false));
> -}
> -
>  /*
>   * Check if the diffs for a certain cpu indicate that
>   * an update is needed.
> @@ -1452,6 +1445,30 @@ static bool need_update(int cpu)
>  	return false;
>  }
>  
> +void quiet_vmstat(void)
> +{
> +	if (system_state != SYSTEM_RUNNING)
> +		return;
> +
> +	/*
> +	 * If we are already in hands of the shepherd then there
> +	 * is nothing for us to do here.
> +	 */
> +	if (cpumask_test_and_set_cpu(smp_processor_id(),
> cpu_stat_off))
> +		return;
> +
> +	if (!need_update(smp_processor_id()))
> +		return;
> +
> +	/*
> +	 * Just refresh counters and do not care about the pending
> delayed
> +	 * vmstat_update. It doesn't fire that often to matter and
> canceling
> +	 * it would be too expensive from this path.
> +	 * vmstat_shepherd will take care about that for us.
> +	 */
> +	refresh_cpu_vm_stats(false);
> +}
> +
>  
>  /*
>   * Shepherd worker thread that checks the
> @@ -1470,11 +1487,14 @@ static void vmstat_shepherd(struct
> work_struct *w)
>  	get_online_cpus();
>  	/* Check processors whose vmstat worker threads have been
> disabled */
>  	for_each_cpu(cpu, cpu_stat_off)
> -		if (need_update(cpu) &&
> -			cpumask_test_and_clear_cpu(cpu,
> cpu_stat_off))
> -
> -			queue_delayed_work_on(cpu, vmstat_wq,
> -				&per_cpu(vmstat_work, cpu), 0);
> +		if (need_update(cpu)) {
> +			if (cpumask_test_and_clear_cpu(cpu,
> cpu_stat_off))
> +				queue_delayed_work_on(cpu,
> vmstat_wq,
> +					&per_cpu(vmstat_work, cpu),
> 0);
> +		} else {
> +			cpumask_set_cpu(cpu, cpu_stat_off);
> +			cancel_delayed_work(this_cpu_ptr(&vmstat_wor
> k));
> +		}
>  
>  	put_online_cpus();
>  
> -- 
> 2.7.0.rc3
> 

  reply	other threads:[~2016-01-27 17:04 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-19  0:33 mm, vmstat: kernel BUG at mm/vmstat.c:1408! Sasha Levin
2015-12-21 13:08 ` Christoph Lameter
2015-12-21 20:28   ` Sasha Levin
2015-12-21 21:07     ` Sasha Levin
2015-12-21 21:14       ` Christoph Lameter
2015-12-22 17:21         ` Christoph Lameter
2015-12-24 20:14         ` Sasha Levin
2015-12-29 17:01           ` Christoph Lameter
2015-12-29 17:18             ` Christoph Lameter
2016-01-04 18:05           ` Christoph Lameter
2016-01-04 18:46             ` Sasha Levin
2016-01-12 11:31   ` Shiraz Hashim
2016-01-12 12:23     ` Christoph Lameter
2016-01-12 12:27       ` Shiraz Hashim
2016-01-13 11:36       ` Shiraz Hashim
2016-01-13 12:32         ` Shiraz Hashim
2016-01-14 21:06         ` Sasha Levin
2016-01-20 14:37 ` Michal Hocko
2016-01-20 14:56   ` Sasha Levin
2016-01-20 15:10     ` Michal Hocko
2016-01-20 15:20       ` Christoph Lameter
2016-01-20 15:49         ` Sasha Levin
2016-01-20 15:55           ` Christoph Lameter
2016-01-20 21:28             ` Michal Hocko
2016-01-20 21:57               ` Christoph Lameter
2016-01-21  8:24                 ` Michal Hocko
2016-01-21 15:45                   ` Christoph Lameter
2016-01-21 16:51                     ` Michal Hocko
2016-01-21 17:38                       ` Christoph Lameter
2016-01-22 11:00                         ` Shiraz Hashim
2016-01-22 14:04                         ` Michal Hocko
2016-01-22 16:07                           ` Christoph Lameter
2016-01-22 16:12                             ` Michal Hocko
2016-01-22 16:46                               ` Christoph Lameter
2016-01-22 17:12                                 ` Michal Hocko
2016-01-23 16:21                                 ` fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) Mike Galbraith
2016-01-24  0:33                                   ` Christoph Lameter
2016-01-24  2:46                                     ` Mike Galbraith
2016-01-24  3:46                                       ` Christoph Lameter
2016-01-24  5:36                                         ` Mike Galbraith
2016-01-25 17:42                                   ` Michal Hocko
2016-01-25 18:02                                     ` Christoph Lameter
2016-01-25 20:13                                       ` Michal Hocko
2016-01-26 16:25                                         ` Christoph Lameter
2016-01-26 18:31                                           ` Mike Galbraith
2016-01-26 18:34                                             ` Christoph Lameter
2016-01-26 18:45                                               ` Mike Galbraith
2016-01-26 19:20                                                 ` Christoph Lameter
2016-01-27  3:12                                                   ` Mike Galbraith
2016-01-27  4:15                                                     ` Mike Galbraith
2016-01-27 16:28                                                     ` Christoph Lameter
2016-01-28 15:36                                                       ` Frederic Weisbecker
2016-01-28 16:42                                                         ` Christoph Lameter
2016-01-26  2:14                                       ` Mike Galbraith
2016-01-26  2:25                                         ` Mike Galbraith
2016-01-26 16:26                                           ` Christoph Lameter
2016-01-26 17:39                                             ` Mike Galbraith
2016-01-26 18:19                                               ` Christoph Lameter
2016-01-26 16:26                                         ` Christoph Lameter
2016-01-26 17:08                                           ` Mike Galbraith
2016-01-26 18:22                                             ` Christoph Lameter
2016-01-26 19:09                                               ` Mike Galbraith
2016-01-26 19:22                                                 ` Christoph Lameter
2016-01-27 16:48                                   ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) " Michal Hocko
2016-01-27 17:03                                     ` Mike Galbraith [this message]
2016-01-27 18:26                                     ` Christoph Lameter
2016-01-28 15:21                                       ` Michal Hocko
2016-01-28 16:40                                         ` Christoph Lameter
2016-01-28 16:53                                           ` Michal Hocko
2016-01-28 17:05                                             ` Christoph Lameter
2016-01-28 15:31                                       ` Michal Hocko
2016-01-28 15:37                                         ` [PATCH] vmstat: make vmstat_update deferrable (was: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter) " Michal Hocko
2016-01-28 16:48                                           ` Christoph Lameter
2016-01-28 16:42                                         ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast " Christoph Lameter
2016-01-24 16:57                                 ` mm, vmstat: kernel BUG at mm/vmstat.c:1408! Linus Torvalds
2016-01-20 15:14   ` Christoph Lameter
2016-01-20 15:20     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1453914234.3517.32.camel@gmail.com \
    --to=umgwanakikbuti@gmail.com \
    --cc=cl@linux.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).