linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 4.4 0/2] vmstat backports
@ 2019-07-29 15:40 Daniel Wagner
  2019-07-29 15:40 ` [PATCH 4.4 1/2] vmstat: Remove BUG_ON from vmstat_update Daniel Wagner
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Daniel Wagner @ 2019-07-29 15:40 UTC (permalink / raw)
  To: stable, linux-kernel; +Cc: Greg KH, Jon Hunter

Hi Greg,

Second attempt on this topic [1]:

"""
Upstream commmit 0eb77e988032 ("vmstat: make vmstat_updater deferrable
again and shut down on idle") was back ported in v4.4.178
(bdf3c006b9a2). For -rt we definitely need the bugfix f01f17d3705b
("mm, vmstat: make quiet_vmstat lighter") as well.

Since the offending patch was back ported to v4.4 stable only, the
other stable branches don't need an update (offending patch and bug
fix are already in).
"""

Though I missed a dependency as Jon noted[2]. The missing patch is
587198ba5206 ("vmstat: Remove BUG_ON from vmstat_update"). I've tested
this on a Tegra K1 one board which exposed the bug. With this should
be fine.

While at it, I looked on all relevant changes for
vmstat_updated(). These two patches are the only relevant changes
which are missing. It seems almost all changes from mainline have made
it back to v4.

Could you please queue the above patches for v4.4.y?

Thanks,
Daniel

[1] https://lore.kernel.org/stable/20190513061237.4915-1-wagi@monom.org
[2] https://lore.kernel.org/stable/f32de22f-c928-2eaa-ee3f-d2b26c184dd4@nvidia.com


Christoph Lameter (1):
  vmstat: Remove BUG_ON from vmstat_update

Michal Hocko (1):
  mm, vmstat: make quiet_vmstat lighter

 mm/vmstat.c | 80 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 47 insertions(+), 33 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 4.4 1/2] vmstat: Remove BUG_ON from vmstat_update
  2019-07-29 15:40 [PATCH 4.4 0/2] vmstat backports Daniel Wagner
@ 2019-07-29 15:40 ` Daniel Wagner
  2019-07-29 15:40 ` [PATCH 4.4 2/2] mm, vmstat: make quiet_vmstat lighter Daniel Wagner
  2019-07-29 18:01 ` [PATCH 4.4 0/2] vmstat backports Greg KH
  2 siblings, 0 replies; 4+ messages in thread
From: Daniel Wagner @ 2019-07-29 15:40 UTC (permalink / raw)
  To: stable, linux-kernel; +Cc: Greg KH, Jon Hunter

From: Christoph Lameter <cl@linux.com>

[ Upstream commit 587198ba5206cdf0d30855f7361af950a4172cd6 ]

If we detect that there is nothing to do just set the flag and do not
check if it was already set before.  Races really do not matter.  If the
flag is set by any code then the shepherd will start dealing with the
situation and reenable the vmstat workers when necessary again.

Since commit 0eb77e988032 ("vmstat: make vmstat_updater deferrable again
and shut down on idle") quiet_vmstat might update cpu_stat_off and mark
a particular cpu to be handled by vmstat_shepherd.  This might trigger a
VM_BUG_ON in vmstat_update because the work item might have been
sleeping during the idle period and see the cpu_stat_off updated after
the wake up.  The VM_BUG_ON is therefore misleading and no more
appropriate.  Moreover it doesn't really suite any protection from real
bugs because vmstat_shepherd will simply reschedule the vmstat_work
anytime it sees a particular cpu set or vmstat_update would do the same
from the worker context directly.  Even when the two would race the
result wouldn't be incorrect as the counters update is fully idempotent.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Wagner <wagi@monom.org>
---
 mm/vmstat.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index dd0a13013cb4..233045057a30 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1407,17 +1407,7 @@ static void vmstat_update(struct work_struct *w)
 		 * Defer the checking for differentials to the
 		 * shepherd thread on a different processor.
 		 */
-		int r;
-		/*
-		 * Shepherd work thread does not race since it never
-		 * changes the bit if its zero but the cpu
-		 * online / off line code may race if
-		 * worker threads are still allowed during
-		 * shutdown / startup.
-		 */
-		r = cpumask_test_and_set_cpu(smp_processor_id(),
-			cpu_stat_off);
-		VM_BUG_ON(r);
+		cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
 	}
 }
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 4.4 2/2] mm, vmstat: make quiet_vmstat lighter
  2019-07-29 15:40 [PATCH 4.4 0/2] vmstat backports Daniel Wagner
  2019-07-29 15:40 ` [PATCH 4.4 1/2] vmstat: Remove BUG_ON from vmstat_update Daniel Wagner
@ 2019-07-29 15:40 ` Daniel Wagner
  2019-07-29 18:01 ` [PATCH 4.4 0/2] vmstat backports Greg KH
  2 siblings, 0 replies; 4+ messages in thread
From: Daniel Wagner @ 2019-07-29 15:40 UTC (permalink / raw)
  To: stable, linux-kernel; +Cc: Greg KH, Jon Hunter

From: Michal Hocko <mhocko@suse.com>

[ Upstream commit f01f17d3705bb6081c9e5728078f64067982be36 ]

Mike has reported a considerable overhead of refresh_cpu_vm_stats from
the idle entry during pipe test:

    12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
     4.75%  [kernel]       [k] __schedule
     4.70%  [kernel]       [k] mutex_unlock
     3.14%  [kernel]       [k] __switch_to

This is caused by commit 0eb77e988032 ("vmstat: make vmstat_updater
deferrable again and shut down on idle") which has placed quiet_vmstat
into cpu_idle_loop.  The main reason here seems to be that the idle
entry has to get over all zones and perform atomic operations for each
vmstat entry even though there might be no per cpu diffs.  This is a
pointless overhead for _each_ idle entry.

Make sure that quiet_vmstat is as light as possible.

First of all it doesn't make any sense to do any local sync if the
current cpu is already set in oncpu_stat_off because vmstat_update puts
itself there only if there is nothing to do.

Then we can check need_update which should be a cheap way to check for
potential per-cpu diffs and only then do refresh_cpu_vm_stats.

The original patch also did cancel_delayed_work which we are not doing
here.  There are two reasons for that.  Firstly cancel_delayed_work from
idle context will blow up on RT kernels (reported by Mike):

  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7
  Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
  Call Trace:
    dump_stack+0x49/0x67
    ___might_sleep+0xf5/0x180
    rt_spin_lock+0x20/0x50
    try_to_grab_pending+0x69/0x240
    cancel_delayed_work+0x26/0xe0
    quiet_vmstat+0x75/0xa0
    cpu_idle_loop+0x38/0x3e0
    cpu_startup_entry+0x13/0x20
    start_secondary+0x114/0x140

And secondly, even on !RT kernels it might add some non trivial overhead
which is not necessary.  Even if the vmstat worker wakes up and preempts
idle then it will be most likely a single shot noop because the stats
were already synced and so it would end up on the oncpu_stat_off anyway.
We just need to teach both vmstat_shepherd and vmstat_update to stop
scheduling the worker if there is nothing to do.

[mgalbraith@suse.de: cancel pending work of the cpu_stat_off CPU]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Wagner <wagi@monom.org>
---
 mm/vmstat.c | 68 ++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 46 insertions(+), 22 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 233045057a30..59e131e82b81 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1395,10 +1395,15 @@ static void vmstat_update(struct work_struct *w)
 		 * Counters were updated so we expect more updates
 		 * to occur in the future. Keep on running the
 		 * update worker thread.
+		 * If we were marked on cpu_stat_off clear the flag
+		 * so that vmstat_shepherd doesn't schedule us again.
 		 */
-		queue_delayed_work_on(smp_processor_id(), vmstat_wq,
-			this_cpu_ptr(&vmstat_work),
-			round_jiffies_relative(sysctl_stat_interval));
+		if (!cpumask_test_and_clear_cpu(smp_processor_id(),
+						cpu_stat_off)) {
+			queue_delayed_work_on(smp_processor_id(), vmstat_wq,
+				this_cpu_ptr(&vmstat_work),
+				round_jiffies_relative(sysctl_stat_interval));
+		}
 	} else {
 		/*
 		 * We did not update any counters so the app may be in
@@ -1416,18 +1421,6 @@ static void vmstat_update(struct work_struct *w)
  * until the diffs stay at zero. The function is used by NOHZ and can only be
  * invoked when tick processing is not active.
  */
-void quiet_vmstat(void)
-{
-	if (system_state != SYSTEM_RUNNING)
-		return;
-
-	do {
-		if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
-			cancel_delayed_work(this_cpu_ptr(&vmstat_work));
-
-	} while (refresh_cpu_vm_stats(false));
-}
-
 /*
  * Check if the diffs for a certain cpu indicate that
  * an update is needed.
@@ -1451,6 +1444,30 @@ static bool need_update(int cpu)
 	return false;
 }
 
+void quiet_vmstat(void)
+{
+	if (system_state != SYSTEM_RUNNING)
+		return;
+
+	/*
+	 * If we are already in hands of the shepherd then there
+	 * is nothing for us to do here.
+	 */
+	if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
+		return;
+
+	if (!need_update(smp_processor_id()))
+		return;
+
+	/*
+	 * Just refresh counters and do not care about the pending delayed
+	 * vmstat_update. It doesn't fire that often to matter and canceling
+	 * it would be too expensive from this path.
+	 * vmstat_shepherd will take care about that for us.
+	 */
+	refresh_cpu_vm_stats(false);
+}
+
 
 /*
  * Shepherd worker thread that checks the
@@ -1468,18 +1485,25 @@ static void vmstat_shepherd(struct work_struct *w)
 
 	get_online_cpus();
 	/* Check processors whose vmstat worker threads have been disabled */
-	for_each_cpu(cpu, cpu_stat_off)
-		if (need_update(cpu) &&
-			cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
-
-			queue_delayed_work_on(cpu, vmstat_wq,
-				&per_cpu(vmstat_work, cpu), 0);
+	for_each_cpu(cpu, cpu_stat_off) {
+		struct delayed_work *dw = &per_cpu(vmstat_work, cpu);
 
+		if (need_update(cpu)) {
+			if (cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
+				queue_delayed_work_on(cpu, vmstat_wq, dw, 0);
+		} else {
+			/*
+			 * Cancel the work if quiet_vmstat has put this
+			 * cpu on cpu_stat_off because the work item might
+			 * be still scheduled
+			 */
+			cancel_delayed_work(dw);
+		}
+	}
 	put_online_cpus();
 
 	schedule_delayed_work(&shepherd,
 		round_jiffies_relative(sysctl_stat_interval));
-
 }
 
 static void __init start_shepherd_timer(void)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 4.4 0/2] vmstat backports
  2019-07-29 15:40 [PATCH 4.4 0/2] vmstat backports Daniel Wagner
  2019-07-29 15:40 ` [PATCH 4.4 1/2] vmstat: Remove BUG_ON from vmstat_update Daniel Wagner
  2019-07-29 15:40 ` [PATCH 4.4 2/2] mm, vmstat: make quiet_vmstat lighter Daniel Wagner
@ 2019-07-29 18:01 ` Greg KH
  2 siblings, 0 replies; 4+ messages in thread
From: Greg KH @ 2019-07-29 18:01 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: stable, linux-kernel, Jon Hunter

On Mon, Jul 29, 2019 at 05:40:44PM +0200, Daniel Wagner wrote:
> Hi Greg,
> 
> Second attempt on this topic [1]:
> 
> """
> Upstream commmit 0eb77e988032 ("vmstat: make vmstat_updater deferrable
> again and shut down on idle") was back ported in v4.4.178
> (bdf3c006b9a2). For -rt we definitely need the bugfix f01f17d3705b
> ("mm, vmstat: make quiet_vmstat lighter") as well.
> 
> Since the offending patch was back ported to v4.4 stable only, the
> other stable branches don't need an update (offending patch and bug
> fix are already in).
> """
> 
> Though I missed a dependency as Jon noted[2]. The missing patch is
> 587198ba5206 ("vmstat: Remove BUG_ON from vmstat_update"). I've tested
> this on a Tegra K1 one board which exposed the bug. With this should
> be fine.
> 
> While at it, I looked on all relevant changes for
> vmstat_updated(). These two patches are the only relevant changes
> which are missing. It seems almost all changes from mainline have made
> it back to v4.
> 
> Could you please queue the above patches for v4.4.y?
> 
> Thanks,
> Daniel
> 
> [1] https://lore.kernel.org/stable/20190513061237.4915-1-wagi@monom.org
> [2] https://lore.kernel.org/stable/f32de22f-c928-2eaa-ee3f-d2b26c184dd4@nvidia.com
> 
> 
> Christoph Lameter (1):
>   vmstat: Remove BUG_ON from vmstat_update
> 
> Michal Hocko (1):
>   mm, vmstat: make quiet_vmstat lighter
> 
>  mm/vmstat.c | 80 +++++++++++++++++++++++++++++++----------------------
>  1 file changed, 47 insertions(+), 33 deletions(-)

Now queued up.

greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-07-29 18:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-29 15:40 [PATCH 4.4 0/2] vmstat backports Daniel Wagner
2019-07-29 15:40 ` [PATCH 4.4 1/2] vmstat: Remove BUG_ON from vmstat_update Daniel Wagner
2019-07-29 15:40 ` [PATCH 4.4 2/2] mm, vmstat: make quiet_vmstat lighter Daniel Wagner
2019-07-29 18:01 ` [PATCH 4.4 0/2] vmstat backports Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).