linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] vmstat updates
@ 2016-01-28 17:17 Michal Hocko
  2016-01-28 17:17 ` [PATCH 1/2] mm, vmstat: make quiet_vmstat lighter Michal Hocko
  2016-01-28 17:17 ` [PATCH 2/2] vmstat: make vmstat_update deferrable Michal Hocko
  0 siblings, 2 replies; 5+ messages in thread
From: Michal Hocko @ 2016-01-28 17:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Cristopher Lameter, Mike Galbraith, linux-mm, LKML

Hi Andrew,
the following two patches are on top of the current mmotm tree and the
first one is a fixup for 0eb77e988032 ("vmstat: make vmstat_updater
deferrable again and shut down on idle") merged during this merge
window.  Mike has encountered issues [1] and the first patch fixes both
issues.  The second patch is a follow up enhancement on top. I guess it
would be better to push this to 4.5 after it warms up in linux-next for
1-2 rc cycles to catch potential fallouts.

[1] http://lkml.kernel.org/r/1453566115.3529.8.camel@gmail.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] mm, vmstat: make quiet_vmstat lighter
  2016-01-28 17:17 [PATCH 0/2] vmstat updates Michal Hocko
@ 2016-01-28 17:17 ` Michal Hocko
  2016-02-02  7:53   ` [PATCH 3/2] mm, vmstat: cancel pending work of the cpu_stat_off CPU Mike Galbraith
  2016-01-28 17:17 ` [PATCH 2/2] vmstat: make vmstat_update deferrable Michal Hocko
  1 sibling, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2016-01-28 17:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Cristopher Lameter, Mike Galbraith, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Mike has reported a considerable overhead of refresh_cpu_vm_stats from
the idle entry during pipe test:
    12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
     4.75%  [kernel]       [k] __schedule
     4.70%  [kernel]       [k] mutex_unlock
     3.14%  [kernel]       [k] __switch_to

This is caused by 0eb77e988032 ("vmstat: make vmstat_updater deferrable
again and shut down on idle") which has placed quiet_vmstat into
cpu_idle_loop. The main reason here seems to be that the idle entry has
to get over all zones and perform atomic operations for each vmstat
entry even though there might be no per cpu diffs. This is a pointless
overhead for _each_ idle entry.

Make sure that quiet_vmstat is as light as possible.

 First of all it doesn't make any sense to do any local sync if the
current cpu is already set in oncpu_stat_off because vmstat_update puts
itself there only if there is nothing to do.

Then we can check need_update which should be a cheap way to check for
potential per-cpu diffs and only then do refresh_cpu_vm_stats.

The original patch also did cancel_delayed_work which we are not doing
here. There are two reasons for that. Firstly cancel_delayed_work from
idle context will blow up on RT kernels (reported by Mike):
[    2.279582] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7
[    2.280444] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[    2.281316]  ffff88040b00d640 ffff88040b01fe10 ffffffff812d20e2 0000000000000000
[    2.282202]  ffff88040b01fe30 ffffffff81081095 ffff88041ec4cee0 ffff88041ec501e0
[    2.283073]  ffff88040b01fe48 ffffffff815ff910 ffff88041ec4cee0 ffff88040b01fe88
[    2.283941] Call Trace:
[    2.284797]  [<ffffffff812d20e2>] dump_stack+0x49/0x67
[    2.285658]  [<ffffffff81081095>] ___might_sleep+0xf5/0x180
[    2.286521]  [<ffffffff815ff910>] rt_spin_lock+0x20/0x50
[    2.287382]  [<ffffffff81075919>] try_to_grab_pending+0x69/0x240
[    2.288239]  [<ffffffff81075b16>] cancel_delayed_work+0x26/0xe0
[    2.289094]  [<ffffffff8115ec05>] quiet_vmstat+0x75/0xa0
[    2.289949]  [<ffffffff8109ab38>] cpu_idle_loop+0x38/0x3e0
[    2.290800]  [<ffffffff8109aef3>] cpu_startup_entry+0x13/0x20
[    2.291647]  [<ffffffff81036164>] start_secondary+0x114/0x140

And secondly, even on !RT kernels it might add some non trivial overhead
which is not necessary. Even if the vmstat worker wakes up and preempts
idle then it will be most likely a single shot noop because the stats
were already synced and so it would end up on the oncpu_stat_off anyway.
We just need to teach both vmstat_shepherd and vmstat_update to stop
scheduling the worker if there is nothing to do.

Acked-by: Christoph Lameter <cl@linux.com>
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/vmstat.c | 64 ++++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 44 insertions(+), 20 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 40b2c74ddf16..eb30bf45bd55 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1396,10 +1396,15 @@ static void vmstat_update(struct work_struct *w)
 		 * Counters were updated so we expect more updates
 		 * to occur in the future. Keep on running the
 		 * update worker thread.
+		 * If we were marked on cpu_stat_off clear the flag
+		 * so that vmstat_shepherd doesn't schedule us again.
 		 */
-		queue_delayed_work_on(smp_processor_id(), vmstat_wq,
-			this_cpu_ptr(&vmstat_work),
-			round_jiffies_relative(sysctl_stat_interval));
+		if (!cpumask_test_and_clear_cpu(smp_processor_id(),
+						cpu_stat_off)) {
+			queue_delayed_work_on(smp_processor_id(), vmstat_wq,
+				this_cpu_ptr(&vmstat_work),
+				round_jiffies_relative(sysctl_stat_interval));
+		}
 	} else {
 		/*
 		 * We did not update any counters so the app may be in
@@ -1417,18 +1422,6 @@ static void vmstat_update(struct work_struct *w)
  * until the diffs stay at zero. The function is used by NOHZ and can only be
  * invoked when tick processing is not active.
  */
-void quiet_vmstat(void)
-{
-	if (system_state != SYSTEM_RUNNING)
-		return;
-
-	do {
-		if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
-			cancel_delayed_work(this_cpu_ptr(&vmstat_work));
-
-	} while (refresh_cpu_vm_stats(false));
-}
-
 /*
  * Check if the diffs for a certain cpu indicate that
  * an update is needed.
@@ -1452,6 +1445,30 @@ static bool need_update(int cpu)
 	return false;
 }
 
+void quiet_vmstat(void)
+{
+	if (system_state != SYSTEM_RUNNING)
+		return;
+
+	/*
+	 * If we are already in hands of the shepherd then there
+	 * is nothing for us to do here.
+	 */
+	if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
+		return;
+
+	if (!need_update(smp_processor_id()))
+		return;
+
+	/*
+	 * Just refresh counters and do not care about the pending delayed
+	 * vmstat_update. It doesn't fire that often to matter and canceling
+	 * it would be too expensive from this path.
+	 * vmstat_shepherd will take care about that for us.
+	 */
+	refresh_cpu_vm_stats(false);
+}
+
 
 /*
  * Shepherd worker thread that checks the
@@ -1470,11 +1487,18 @@ static void vmstat_shepherd(struct work_struct *w)
 	get_online_cpus();
 	/* Check processors whose vmstat worker threads have been disabled */
 	for_each_cpu(cpu, cpu_stat_off)
-		if (need_update(cpu) &&
-			cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
-
-			queue_delayed_work_on(cpu, vmstat_wq,
-				&per_cpu(vmstat_work, cpu), 0);
+		if (need_update(cpu)) {
+			if (cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
+				queue_delayed_work_on(cpu, vmstat_wq,
+					&per_cpu(vmstat_work, cpu), 0);
+		} else {
+			/*
+			 * Cancel the work if quiet_vmstat has put this
+			 * cpu on cpu_stat_off because the work item might
+			 * be still scheduled
+			 */
+			cancel_delayed_work(this_cpu_ptr(&vmstat_work));
+		}
 
 	put_online_cpus();
 
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] vmstat: make vmstat_update deferrable
  2016-01-28 17:17 [PATCH 0/2] vmstat updates Michal Hocko
  2016-01-28 17:17 ` [PATCH 1/2] mm, vmstat: make quiet_vmstat lighter Michal Hocko
@ 2016-01-28 17:17 ` Michal Hocko
  1 sibling, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2016-01-28 17:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Cristopher Lameter, Mike Galbraith, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut
down on idle") made vmstat_shepherd deferrable. vmstat_update itself
is still useing standard timer which might interrupt idle task. This
is possible because "mm, vmstat: make quiet_vmstat lighter" removed
cancel_delayed_work from the quiet_vmstat. Change vmstat_work to
use DEFERRABLE_WORK to prevent from pointless wakeups from the idle
context.

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/vmstat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index eb30bf45bd55..69537d2be6f6 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1512,7 +1512,7 @@ static void __init start_shepherd_timer(void)
 	int cpu;
 
 	for_each_possible_cpu(cpu)
-		INIT_DELAYED_WORK(per_cpu_ptr(&vmstat_work, cpu),
+		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
 			vmstat_update);
 
 	if (!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL))
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/2] mm, vmstat: cancel pending work of the cpu_stat_off CPU
  2016-01-28 17:17 ` [PATCH 1/2] mm, vmstat: make quiet_vmstat lighter Michal Hocko
@ 2016-02-02  7:53   ` Mike Galbraith
  2016-02-02  8:18     ` Michal Hocko
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Galbraith @ 2016-02-02  7:53 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: Cristopher Lameter, Mike Galbraith, linux-mm, LKML, Michal Hocko

Cancel pending work of the cpu_stat_off CPU.

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
---
 mm/vmstat.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1486,25 +1486,25 @@ static void vmstat_shepherd(struct work_
 
 	get_online_cpus();
 	/* Check processors whose vmstat worker threads have been disabled */
-	for_each_cpu(cpu, cpu_stat_off)
+	for_each_cpu(cpu, cpu_stat_off) {
+		struct delayed_work *dw = &per_cpu(vmstat_work, cpu);
+
 		if (need_update(cpu)) {
 			if (cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
-				queue_delayed_work_on(cpu, vmstat_wq,
-					&per_cpu(vmstat_work, cpu), 0);
+				queue_delayed_work_on(cpu, vmstat_wq, dw, 0);
 		} else {
 			/*
 			 * Cancel the work if quiet_vmstat has put this
 			 * cpu on cpu_stat_off because the work item might
 			 * be still scheduled
 			 */
-			cancel_delayed_work(this_cpu_ptr(&vmstat_work));
+			cancel_delayed_work(dw);
 		}
-
+	}
 	put_online_cpus();
 
 	schedule_delayed_work(&shepherd,
 		round_jiffies_relative(sysctl_stat_interval));
-
 }
 
 static void __init start_shepherd_timer(void)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 3/2] mm, vmstat: cancel pending work of the cpu_stat_off CPU
  2016-02-02  7:53   ` [PATCH 3/2] mm, vmstat: cancel pending work of the cpu_stat_off CPU Mike Galbraith
@ 2016-02-02  8:18     ` Michal Hocko
  0 siblings, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2016-02-02  8:18 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Andrew Morton, Cristopher Lameter, Mike Galbraith, linux-mm, LKML

On Tue 02-02-16 08:53:25, Mike Galbraith wrote:
> Cancel pending work of the cpu_stat_off CPU.

Thanks for catching that Mike. This was a last minute change and I
screwed it...

Andrew could you fold this into the original patch, please?

Thanks!

> Signed-off-by: Mike Galbraith <mgalbraith@suse.de>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/vmstat.c |   12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1486,25 +1486,25 @@ static void vmstat_shepherd(struct work_
>  
>  	get_online_cpus();
>  	/* Check processors whose vmstat worker threads have been disabled */
> -	for_each_cpu(cpu, cpu_stat_off)
> +	for_each_cpu(cpu, cpu_stat_off) {
> +		struct delayed_work *dw = &per_cpu(vmstat_work, cpu);
> +
>  		if (need_update(cpu)) {
>  			if (cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
> -				queue_delayed_work_on(cpu, vmstat_wq,
> -					&per_cpu(vmstat_work, cpu), 0);
> +				queue_delayed_work_on(cpu, vmstat_wq, dw, 0);
>  		} else {
>  			/*
>  			 * Cancel the work if quiet_vmstat has put this
>  			 * cpu on cpu_stat_off because the work item might
>  			 * be still scheduled
>  			 */
> -			cancel_delayed_work(this_cpu_ptr(&vmstat_work));
> +			cancel_delayed_work(dw);
>  		}
> -
> +	}
>  	put_online_cpus();
>  
>  	schedule_delayed_work(&shepherd,
>  		round_jiffies_relative(sysctl_stat_interval));
> -
>  }
>  
>  static void __init start_shepherd_timer(void)

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-02-02  8:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-28 17:17 [PATCH 0/2] vmstat updates Michal Hocko
2016-01-28 17:17 ` [PATCH 1/2] mm, vmstat: make quiet_vmstat lighter Michal Hocko
2016-02-02  7:53   ` [PATCH 3/2] mm, vmstat: cancel pending work of the cpu_stat_off CPU Mike Galbraith
2016-02-02  8:18     ` Michal Hocko
2016-01-28 17:17 ` [PATCH 2/2] vmstat: make vmstat_update deferrable Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).