linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/3] vmstat: Various enhancements
@ 2015-10-28  2:41 Christoph Lameter
  2015-10-28  2:41 ` [patch 1/3] vmstat: Make pageset processing optional in refresh_cpu_vm_stats Christoph Lameter
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Christoph Lameter @ 2015-10-28  2:41 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, Tejun Heo, Tetsuo Handa, linux-mm, linux-kernel,
	torvalds, hannes, mgorman

This addresses a couple of issues that came up last week in
the discussion about issues related to the blocking of
the execution of vmstat updates.

1. It makes vmstat updates execution deferrable again so that
   no special tick is generated for vmstat execution. vmstat
   is quieted down when a processor enters idle mode. This
   means that no differentials exist anymore when a processor
   is in idle mode.

2. Create a separate workqueue so that the vmstat updater
   is not blocked by other work requeusts. This creates a
   new kernel thread <sigh> and avoids the issue of
   differentials not folded in a timely fashion.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch 1/3] vmstat: Make pageset processing optional in refresh_cpu_vm_stats
  2015-10-28  2:41 [patch 0/3] vmstat: Various enhancements Christoph Lameter
@ 2015-10-28  2:41 ` Christoph Lameter
  2015-10-28  2:41 ` [patch 2/3] vmstat: make vmstat_updater deferrable again and shut down on idle Christoph Lameter
  2015-10-28  2:41 ` [patch 3/3] vmstat: Create our own workqueue Christoph Lameter
  2 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2015-10-28  2:41 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, Tejun Heo, Tetsuo Handa, linux-mm, linux-kernel,
	torvalds, hannes, mgorman

[-- Attachment #1: vmstat_do_pageset_parameter --]
[-- Type: text/plain, Size: 2354 bytes --]

Add a parameter to refresh_cpu_vm_stats() to make pageset expiration
optional. Flushing the pagesets is performed by the page allocator
and thus processing of pagesets may not be wanted when just intending
to fold the differentials.

Signed-of-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -460,7 +460,7 @@ static int fold_diff(int *diff)
  *
  * The function returns the number of global counters updated.
  */
-static int refresh_cpu_vm_stats(void)
+static int refresh_cpu_vm_stats(bool do_pagesets)
 {
 	struct zone *zone;
 	int i;
@@ -484,33 +484,35 @@ static int refresh_cpu_vm_stats(void)
 #endif
 			}
 		}
-		cond_resched();
 #ifdef CONFIG_NUMA
-		/*
-		 * Deal with draining the remote pageset of this
-		 * processor
-		 *
-		 * Check if there are pages remaining in this pageset
-		 * if not then there is nothing to expire.
-		 */
-		if (!__this_cpu_read(p->expire) ||
+		if (do_pagesets) {
+			cond_resched();
+			/*
+			 * Deal with draining the remote pageset of this
+			 * processor
+			 *
+			 * Check if there are pages remaining in this pageset
+			 * if not then there is nothing to expire.
+			 */
+			if (!__this_cpu_read(p->expire) ||
 			       !__this_cpu_read(p->pcp.count))
-			continue;
+				continue;
 
-		/*
-		 * We never drain zones local to this processor.
-		 */
-		if (zone_to_nid(zone) == numa_node_id()) {
-			__this_cpu_write(p->expire, 0);
-			continue;
-		}
+			/*
+			 * We never drain zones local to this processor.
+			 */
+			if (zone_to_nid(zone) == numa_node_id()) {
+				__this_cpu_write(p->expire, 0);
+				continue;
+			}
 
-		if (__this_cpu_dec_return(p->expire))
-			continue;
+			if (__this_cpu_dec_return(p->expire))
+				continue;
 
-		if (__this_cpu_read(p->pcp.count)) {
-			drain_zone_pages(zone, this_cpu_ptr(&p->pcp));
-			changes++;
+			if (__this_cpu_read(p->pcp.count)) {
+				drain_zone_pages(zone, this_cpu_ptr(&p->pcp));
+				changes++;
+			}
 		}
 #endif
 	}
@@ -1363,7 +1365,7 @@ static cpumask_var_t cpu_stat_off;
 
 static void vmstat_update(struct work_struct *w)
 {
-	if (refresh_cpu_vm_stats()) {
+	if (refresh_cpu_vm_stats(true)) {
 		/*
 		 * Counters were updated so we expect more updates
 		 * to occur in the future. Keep on running the


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch 2/3] vmstat: make vmstat_updater deferrable again and shut down on idle
  2015-10-28  2:41 [patch 0/3] vmstat: Various enhancements Christoph Lameter
  2015-10-28  2:41 ` [patch 1/3] vmstat: Make pageset processing optional in refresh_cpu_vm_stats Christoph Lameter
@ 2015-10-28  2:41 ` Christoph Lameter
  2015-10-28  2:41 ` [patch 3/3] vmstat: Create our own workqueue Christoph Lameter
  2 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2015-10-28  2:41 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, Tejun Heo, Tetsuo Handa, linux-mm, linux-kernel,
	torvalds, hannes, mgorman

[-- Attachment #1: vmstat_quiet_on_idle --]
[-- Type: text/plain, Size: 3023 bytes --]

Currently the vmstat updater is not deferrable as a result of commit
ba4877b9ca51f80b5d30f304a46762f0509e1635. This in turn can cause multiple
interruptions of the applications because the vmstat updater may run at
different times than tick processing. No good.

Make vmstate_update deferrable again and provide a function that
shuts down the vmstat updater when we go idle by folding the differentials.
Shut it down from the load average calculation logic introduced by nohz.

Note that the shepherd thread will continue scanning the differentials
from another processor and will reenable the vmstat workers if it
detects any changes.

Fixes: ba4877b9ca51f80b5d30f304a46762f0509e1635 (do not use deferrable delay)
Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1397,6 +1397,20 @@ static void vmstat_update(struct work_st
 }
 
 /*
+ * Switch off vmstat processing and then fold all the remaining differentials
+ * until the diffs stay at zero. The function is used by NOHZ and can only be
+ * invoked when tick processing is not active.
+ */
+void quiet_vmstat(void)
+{
+	do {
+		if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
+			cancel_delayed_work(this_cpu_ptr(&vmstat_work));
+
+	} while (refresh_cpu_vm_stats(false));
+}
+
+/*
  * Check if the diffs for a certain cpu indicate that
  * an update is needed.
  */
@@ -1428,7 +1442,7 @@ static bool need_update(int cpu)
  */
 static void vmstat_shepherd(struct work_struct *w);
 
-static DECLARE_DELAYED_WORK(shepherd, vmstat_shepherd);
+static DECLARE_DEFERRABLE_WORK(shepherd, vmstat_shepherd);
 
 static void vmstat_shepherd(struct work_struct *w)
 {
Index: linux/include/linux/vmstat.h
===================================================================
--- linux.orig/include/linux/vmstat.h
+++ linux/include/linux/vmstat.h
@@ -211,6 +211,7 @@ extern void __inc_zone_state(struct zone
 extern void dec_zone_state(struct zone *, enum zone_stat_item);
 extern void __dec_zone_state(struct zone *, enum zone_stat_item);
 
+void quiet_vmstat(void);
 void cpu_vm_stats_fold(int cpu);
 void refresh_zone_stat_thresholds(void);
 
@@ -272,6 +273,7 @@ static inline void __dec_zone_page_state
 static inline void refresh_cpu_vm_stats(int cpu) { }
 static inline void refresh_zone_stat_thresholds(void) { }
 static inline void cpu_vm_stats_fold(int cpu) { }
+static inline void quiet_vmstat(void) { }
 
 static inline void drain_zonestat(struct zone *zone,
 			struct per_cpu_pageset *pset) { }
Index: linux/kernel/time/tick-sched.c
===================================================================
--- linux.orig/kernel/time/tick-sched.c
+++ linux/kernel/time/tick-sched.c
@@ -667,6 +667,7 @@ static ktime_t tick_nohz_stop_sched_tick
 	 */
 	if (!ts->tick_stopped) {
 		nohz_balance_enter_idle(cpu);
+		quiet_vmstat();
 		calc_load_enter_idle();
 
 		ts->last_tick = hrtimer_get_expires(&ts->sched_timer);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch 3/3] vmstat: Create our own workqueue
  2015-10-28  2:41 [patch 0/3] vmstat: Various enhancements Christoph Lameter
  2015-10-28  2:41 ` [patch 1/3] vmstat: Make pageset processing optional in refresh_cpu_vm_stats Christoph Lameter
  2015-10-28  2:41 ` [patch 2/3] vmstat: make vmstat_updater deferrable again and shut down on idle Christoph Lameter
@ 2015-10-28  2:41 ` Christoph Lameter
  2015-10-28  2:43   ` Tejun Heo
  2 siblings, 1 reply; 19+ messages in thread
From: Christoph Lameter @ 2015-10-28  2:41 UTC (permalink / raw)
  To: akpm
  Cc: Michal Hocko, Tejun Heo, Tetsuo Handa, linux-mm, linux-kernel,
	torvalds, hannes, mgorman

[-- Attachment #1: vmstat_add_workqueue --]
[-- Type: text/plain, Size: 1772 bytes --]

Seems that vmstat needs its own workqueue now since the general
workqueue mechanism has been *enhanced* which means that the
vmstat_updates cannot run reliably but are being blocked by
work requests doing memory allocation. Which causes vmstat
to be unable to keep the counters up to date.

Bad. Fix this by creating our own workqueue.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1359,6 +1359,8 @@ static const struct file_operations proc
 #endif /* CONFIG_PROC_FS */
 
 #ifdef CONFIG_SMP
+static struct workqueue_struct *vmstat_wq;
+
 static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
 int sysctl_stat_interval __read_mostly = HZ;
 static cpumask_var_t cpu_stat_off;
@@ -1371,7 +1373,7 @@ static void vmstat_update(struct work_st
 		 * to occur in the future. Keep on running the
 		 * update worker thread.
 		 */
-		schedule_delayed_work_on(smp_processor_id(),
+		queue_delayed_work_on(smp_processor_id(), vmstat_wq,
 			this_cpu_ptr(&vmstat_work),
 			round_jiffies_relative(sysctl_stat_interval));
 	} else {
@@ -1454,7 +1456,7 @@ static void vmstat_shepherd(struct work_
 		if (need_update(cpu) &&
 			cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
 
-			schedule_delayed_work_on(cpu,
+			queue_delayed_work_on(cpu, vmstat_wq,
 				&per_cpu(vmstat_work, cpu), 0);
 
 	put_online_cpus();
@@ -1543,6 +1545,10 @@ static int __init setup_vmstat(void)
 
 	start_shepherd_timer();
 	cpu_notifier_register_done();
+	vmstat_wq = alloc_workqueue("vmstat",
+		WQ_FREEZABLE|
+		WQ_SYSFS|
+		WQ_MEM_RECLAIM, 0);
 #endif
 #ifdef CONFIG_PROC_FS
 	proc_create("buddyinfo", S_IRUGO, NULL, &fragmentation_file_operations);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 3/3] vmstat: Create our own workqueue
  2015-10-28  2:41 ` [patch 3/3] vmstat: Create our own workqueue Christoph Lameter
@ 2015-10-28  2:43   ` Tejun Heo
  2015-10-28  3:04     ` Christoph Lameter
  0 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2015-10-28  2:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Michal Hocko, Tetsuo Handa, linux-mm, linux-kernel,
	torvalds, hannes, mgorman

Hello,

On Tue, Oct 27, 2015 at 09:41:17PM -0500, Christoph Lameter wrote:
> +	vmstat_wq = alloc_workqueue("vmstat",
> +		WQ_FREEZABLE|
> +		WQ_SYSFS|
> +		WQ_MEM_RECLAIM, 0);

The only thing necessary here is WQ_MEM_RECLAIM.  I don't see how
WQ_SYSFS and WQ_FREEZABLE make sense here.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 3/3] vmstat: Create our own workqueue
  2015-10-28  2:43   ` Tejun Heo
@ 2015-10-28  3:04     ` Christoph Lameter
  2015-10-28 11:57       ` Tetsuo Handa
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Lameter @ 2015-10-28  3:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: akpm, Michal Hocko, Tetsuo Handa, linux-mm, linux-kernel,
	torvalds, hannes, mgorman

On Wed, 28 Oct 2015, Tejun Heo wrote:

> The only thing necessary here is WQ_MEM_RECLAIM.  I don't see how
> WQ_SYSFS and WQ_FREEZABLE make sense here.


Subject: vmstat: Remove WQ_FREEZABLE and WQ_SYSFS

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1546,8 +1546,6 @@ static int __init setup_vmstat(void)
 	start_shepherd_timer();
 	cpu_notifier_register_done();
 	vmstat_wq = alloc_workqueue("vmstat",
-		WQ_FREEZABLE|
-		WQ_SYSFS|
 		WQ_MEM_RECLAIM, 0);
 #endif
 #ifdef CONFIG_PROC_FS

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 3/3] vmstat: Create our own workqueue
  2015-10-28  3:04     ` Christoph Lameter
@ 2015-10-28 11:57       ` Tetsuo Handa
  2015-10-28 22:32         ` Christoph Lameter
  2015-10-29  2:24         ` Tejun Heo
  0 siblings, 2 replies; 19+ messages in thread
From: Tetsuo Handa @ 2015-10-28 11:57 UTC (permalink / raw)
  To: cl, htejun
  Cc: akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes, mgorman

Christoph Lameter wrote:
> On Wed, 28 Oct 2015, Tejun Heo wrote:
> 
> > The only thing necessary here is WQ_MEM_RECLAIM.  I don't see how
> > WQ_SYSFS and WQ_FREEZABLE make sense here.
> 
I can still trigger silent livelock with this patchset applied.

----------
[  272.283217] MemAlloc-Info: 9 stalling task, 0 dying task, 0 victim task.
[  272.285089] MemAlloc: a.out(11325) gfp=0x24280ca order=0 delay=19164
[  272.286817] MemAlloc: a.out(11326) gfp=0x242014a order=0 delay=19104
[  272.288512] MemAlloc: vmtoolsd(1897) gfp=0x242014a order=0 delay=19072
[  272.290280] MemAlloc: kworker/1:3(11286) gfp=0x2400000 order=0 delay=19056
[  272.292114] MemAlloc: sshd(11202) gfp=0x242014a order=0 delay=18927
[  272.293908] MemAlloc: tuned(2073) gfp=0x242014a order=0 delay=18799
[  272.297360] MemAlloc: nmbd(4752) gfp=0x242014a order=0 delay=16532
[  272.299115] MemAlloc: auditd(529) gfp=0x242014a order=0 delay=13073
[  272.302248] MemAlloc: irqbalance(1696) gfp=0x242014a order=0 delay=10529
(...snipped...)
[  272.851035] Showing busy workqueues and worker pools:
[  272.852583] workqueue events: flags=0x0
[  272.853942]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
[  272.855781]     pending: vmw_fb_dirty_flush [vmwgfx]
[  272.857500]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  272.859359]     pending: vmpressure_work_fn
[  272.860840] workqueue events_freezable_power_: flags=0x84
[  272.862461]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
[  272.864479]     in-flight: 11286:disk_events_workfn
[  272.866065]     pending: disk_events_workfn
[  272.867587] workqueue vmstat: flags=0x8
[  272.868942]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[  272.870785]     pending: vmstat_update
[  272.872248] pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=4 idle: 14 218 43
----------

> 2. Create a separate workqueue so that the vmstat updater
>    is not blocked by other work requeusts. This creates a
>    new kernel thread <sigh> and avoids the issue of
>    differentials not folded in a timely fashion.

Did you really mean "the vmstat updater is not blocked by other
work requeusts"?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 3/3] vmstat: Create our own workqueue
  2015-10-28 11:57       ` Tetsuo Handa
@ 2015-10-28 22:32         ` Christoph Lameter
  2015-10-29  2:24         ` Tejun Heo
  1 sibling, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2015-10-28 22:32 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: htejun, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes, mgorman

On Wed, 28 Oct 2015, Tetsuo Handa wrote:

> Christoph Lameter wrote:
> > On Wed, 28 Oct 2015, Tejun Heo wrote:
> >
> > > The only thing necessary here is WQ_MEM_RECLAIM.  I don't see how
> > > WQ_SYSFS and WQ_FREEZABLE make sense here.
> >
> I can still trigger silent livelock with this patchset applied.

Ok so why the vmstat updater still deferred, Tejun?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 3/3] vmstat: Create our own workqueue
  2015-10-28 11:57       ` Tetsuo Handa
  2015-10-28 22:32         ` Christoph Lameter
@ 2015-10-29  2:24         ` Tejun Heo
  2015-10-29  3:08           ` Tejun Heo
  1 sibling, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2015-10-29  2:24 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: cl, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes, mgorman

Hello,

That's weird.

On Wed, Oct 28, 2015 at 08:57:28PM +0900, Tetsuo Handa wrote:
> [  272.851035] Showing busy workqueues and worker pools:
> [  272.852583] workqueue events: flags=0x0
> [  272.853942]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
> [  272.855781]     pending: vmw_fb_dirty_flush [vmwgfx]
> [  272.857500]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
> [  272.859359]     pending: vmpressure_work_fn
> [  272.860840] workqueue events_freezable_power_: flags=0x84
> [  272.862461]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
> [  272.864479]     in-flight: 11286:disk_events_workfn

What's this guy doing?  Can you get stack dump on 11286 (or whatever
is in flight in the next lockup)?

> [  272.866065]     pending: disk_events_workfn
> [  272.867587] workqueue vmstat: flags=0x8
> [  272.868942]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
> [  272.870785]     pending: vmstat_update
> [  272.872248] pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=4 idle: 14 218 43

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 3/3] vmstat: Create our own workqueue
  2015-10-29  2:24         ` Tejun Heo
@ 2015-10-29  3:08           ` Tejun Heo
  2015-10-30  1:01             ` Christoph Lameter
  0 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2015-10-29  3:08 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: cl, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes, mgorman

On Thu, Oct 29, 2015 at 11:24:47AM +0900, Tejun Heo wrote:
> Hello,
> 
> That's weird.
> 
> On Wed, Oct 28, 2015 at 08:57:28PM +0900, Tetsuo Handa wrote:
> > [  272.851035] Showing busy workqueues and worker pools:
> > [  272.852583] workqueue events: flags=0x0
> > [  272.853942]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
> > [  272.855781]     pending: vmw_fb_dirty_flush [vmwgfx]
> > [  272.857500]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
> > [  272.859359]     pending: vmpressure_work_fn
> > [  272.860840] workqueue events_freezable_power_: flags=0x84
> > [  272.862461]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
> > [  272.864479]     in-flight: 11286:disk_events_workfn
> 
> What's this guy doing?  Can you get stack dump on 11286 (or whatever
> is in flight in the next lockup)?

Wait, this series doesn't include Tetsuo's change.  Of course it won't
fix the deadlock problem.  What's necessary is Tetsuo's patch +
WQ_MEM_RECLAIM.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 3/3] vmstat: Create our own workqueue
  2015-10-29  3:08           ` Tejun Heo
@ 2015-10-30  1:01             ` Christoph Lameter
  2015-10-31  1:15               ` Tejun Heo
  2015-10-31  2:43               ` Tetsuo Handa
  0 siblings, 2 replies; 19+ messages in thread
From: Christoph Lameter @ 2015-10-30  1:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Tetsuo Handa, akpm, mhocko, linux-mm, linux-kernel, torvalds,
	hannes, mgorman

On Thu, 29 Oct 2015, Tejun Heo wrote:

> Wait, this series doesn't include Tetsuo's change.  Of course it won't
> fix the deadlock problem.  What's necessary is Tetsuo's patch +
> WQ_MEM_RECLAIM.

This series is only dealing with vmstat changes. Do I get an ack here?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 3/3] vmstat: Create our own workqueue
  2015-10-30  1:01             ` Christoph Lameter
@ 2015-10-31  1:15               ` Tejun Heo
  2015-10-31  2:43               ` Tetsuo Handa
  1 sibling, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2015-10-31  1:15 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Tetsuo Handa, akpm, mhocko, linux-mm, linux-kernel, torvalds,
	hannes, mgorman

On Thu, Oct 29, 2015 at 08:01:12PM -0500, Christoph Lameter wrote:
> On Thu, 29 Oct 2015, Tejun Heo wrote:
> 
> > Wait, this series doesn't include Tetsuo's change.  Of course it won't
> > fix the deadlock problem.  What's necessary is Tetsuo's patch +
> > WQ_MEM_RECLAIM.
> 
> This series is only dealing with vmstat changes. Do I get an ack here?

Yeap, please feel free to add my acked-by.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 3/3] vmstat: Create our own workqueue
  2015-10-30  1:01             ` Christoph Lameter
  2015-10-31  1:15               ` Tejun Heo
@ 2015-10-31  2:43               ` Tetsuo Handa
  2015-11-02 16:12                 ` Christoph Lameter
  1 sibling, 1 reply; 19+ messages in thread
From: Tetsuo Handa @ 2015-10-31  2:43 UTC (permalink / raw)
  To: cl
  Cc: htejun, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes, mgorman

Christoph Lameter wrote:
> On Thu, 29 Oct 2015, Tejun Heo wrote:
> 
> > Wait, this series doesn't include Tetsuo's change.  Of course it won't
> > fix the deadlock problem.  What's necessary is Tetsuo's patch +
> > WQ_MEM_RECLAIM.
> 
> This series is only dealing with vmstat changes. Do I get an ack here?
> 

Then, you need to update below description (or drop it) because
patch 3/3 alone will not guarantee that the counters are up to date.

  Christoph Lameter wrote:
  > Seems that vmstat needs its own workqueue now since the general
  > workqueue mechanism has been *enhanced* which means that the
  > vmstat_updates cannot run reliably but are being blocked by
  > work requests doing memory allocation. Which causes vmstat
  > to be unable to keep the counters up to date.

I am waiting for decision from candidates listed at
http://lkml.kernel.org/r/201510251952.CEF04109.OSOtLFHFVFJMQO@I-love.SAKURA.ne.jp .
If your series is not for backporting, please choose one from
the candidates. Can you accept the original patch at
http://lkml.kernel.org/r/201510212126.JIF90648.HOOFJVFQLMStOF@I-love.SAKURA.ne.jp
which implements (1) from the candidates?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 3/3] vmstat: Create our own workqueue
  2015-10-31  2:43               ` Tetsuo Handa
@ 2015-11-02 16:12                 ` Christoph Lameter
  2015-11-02 16:52                   ` Tetsuo Handa
  2015-11-06 11:28                   ` Tetsuo Handa
  0 siblings, 2 replies; 19+ messages in thread
From: Christoph Lameter @ 2015-11-02 16:12 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: htejun, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes, mgorman

On Sat, 31 Oct 2015, Tetsuo Handa wrote:

> Then, you need to update below description (or drop it) because
> patch 3/3 alone will not guarantee that the counters are up to date.

The vmstat system does not guarantee that the counters are up to date
always. The whole point is the deferral of updates for performance
reasons. They are updated *at some point* within stat_interval. That needs
to happen and that is what this patchset is fixing.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 3/3] vmstat: Create our own workqueue
  2015-11-02 16:12                 ` Christoph Lameter
@ 2015-11-02 16:52                   ` Tetsuo Handa
  2015-11-02 18:10                     ` Christoph Lameter
  2015-11-06 11:28                   ` Tetsuo Handa
  1 sibling, 1 reply; 19+ messages in thread
From: Tetsuo Handa @ 2015-11-02 16:52 UTC (permalink / raw)
  To: cl
  Cc: htejun, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes, mgorman

Christoph Lameter wrote:
> On Sat, 31 Oct 2015, Tetsuo Handa wrote:
> 
> > Then, you need to update below description (or drop it) because
> > patch 3/3 alone will not guarantee that the counters are up to date.
> 
> The vmstat system does not guarantee that the counters are up to date
> always. The whole point is the deferral of updates for performance
> reasons. They are updated *at some point* within stat_interval. That needs
> to happen and that is what this patchset is fixing.
> 
I'm still unclear. I think that the result of this patchset is

  The counters are never updated even after stat_interval
  if some workqueue item is doing a __GFP_WAIT memory allocation.

but the patch description sounds as if

  The counters will be updated even if some workqueue item is
  doing a __GFP_WAIT memory allocation.

which denies the actual result I tested with this patchset applied.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 3/3] vmstat: Create our own workqueue
  2015-11-02 16:52                   ` Tetsuo Handa
@ 2015-11-02 18:10                     ` Christoph Lameter
  2015-11-02 19:11                       ` Tejun Heo
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Lameter @ 2015-11-02 18:10 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: htejun, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes, mgorman

On Tue, 3 Nov 2015, Tetsuo Handa wrote:

> I'm still unclear. I think that the result of this patchset is
>
>   The counters are never updated even after stat_interval
>   if some workqueue item is doing a __GFP_WAIT memory allocation.
>
> but the patch description sounds as if
>
>   The counters will be updated even if some workqueue item is
>   doing a __GFP_WAIT memory allocation.
>
> which denies the actual result I tested with this patchset applied.

Well true that is dependend on the correct workqueue operation. I though
that was fixed by Tejun?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 3/3] vmstat: Create our own workqueue
  2015-11-02 18:10                     ` Christoph Lameter
@ 2015-11-02 19:11                       ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2015-11-02 19:11 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Tetsuo Handa, akpm, mhocko, linux-mm, linux-kernel, torvalds,
	hannes, mgorman

On Mon, Nov 02, 2015 at 12:10:04PM -0600, Christoph Lameter wrote:
> Well true that is dependend on the correct workqueue operation. I though
> that was fixed by Tejun?

At least for now, we're going with Tetsuo's short sleep patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 3/3] vmstat: Create our own workqueue
  2015-11-02 16:12                 ` Christoph Lameter
  2015-11-02 16:52                   ` Tetsuo Handa
@ 2015-11-06 11:28                   ` Tetsuo Handa
  2015-11-06 12:56                     ` Christoph Lameter
  1 sibling, 1 reply; 19+ messages in thread
From: Tetsuo Handa @ 2015-11-06 11:28 UTC (permalink / raw)
  To: cl
  Cc: htejun, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes, mgorman

Christoph Lameter wrote:
> On Sat, 31 Oct 2015, Tetsuo Handa wrote:
> 
> > Then, you need to update below description (or drop it) because
> > patch 3/3 alone will not guarantee that the counters are up to date.
> 
> The vmstat system does not guarantee that the counters are up to date
> always. The whole point is the deferral of updates for performance
> reasons. They are updated *at some point* within stat_interval. That needs
> to happen and that is what this patchset is fixing.

So, if you refer to the blocking of the execution of vmstat updates,
description for patch 3/3 sould be updated to something like below?

----------
Since __GFP_WAIT memory allocations do not call schedule()
when there is nothing to reclaim, and workqueue does not kick
remaining workqueue items unless in-flight workqueue item calls
schedule(), __GFP_WAIT memory allocation requests by workqueue
items can block vmstat_update work item forever.

Since zone_reclaimable() decision depends on vmstat counters
to be up to dated, a silent lockup occurs because a workqueue
item doing a __GFP_WAIT memory allocation request continues
using outdated vmstat counters.

In order to fix this problem, we need to allocate a dedicated
workqueue for vmstat. Note that this patch itself does not fix
lockup problem. Tejun will develop a patch which detects lockup
situation and kick remaining workqueue items.
----------

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 3/3] vmstat: Create our own workqueue
  2015-11-06 11:28                   ` Tetsuo Handa
@ 2015-11-06 12:56                     ` Christoph Lameter
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Lameter @ 2015-11-06 12:56 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: htejun, akpm, mhocko, linux-mm, linux-kernel, torvalds, hannes, mgorman

On Fri, 6 Nov 2015, Tetsuo Handa wrote:

> So, if you refer to the blocking of the execution of vmstat updates,
> description for patch 3/3 sould be updated to something like below?

Ok that is much better.

> ----------
> Since __GFP_WAIT memory allocations do not call schedule()
> when there is nothing to reclaim, and workqueue does not kick
> remaining workqueue items unless in-flight workqueue item calls
> schedule(), __GFP_WAIT memory allocation requests by workqueue
> items can block vmstat_update work item forever.
>
> Since zone_reclaimable() decision depends on vmstat counters
> to be up to dated, a silent lockup occurs because a workqueue
> item doing a __GFP_WAIT memory allocation request continues
> using outdated vmstat counters.
>
> In order to fix this problem, we need to allocate a dedicated
> workqueue for vmstat. Note that this patch itself does not fix
> lockup problem. Tejun will develop a patch which detects lockup
> situation and kick remaining workqueue items.
> ----------
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-11-06 12:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-28  2:41 [patch 0/3] vmstat: Various enhancements Christoph Lameter
2015-10-28  2:41 ` [patch 1/3] vmstat: Make pageset processing optional in refresh_cpu_vm_stats Christoph Lameter
2015-10-28  2:41 ` [patch 2/3] vmstat: make vmstat_updater deferrable again and shut down on idle Christoph Lameter
2015-10-28  2:41 ` [patch 3/3] vmstat: Create our own workqueue Christoph Lameter
2015-10-28  2:43   ` Tejun Heo
2015-10-28  3:04     ` Christoph Lameter
2015-10-28 11:57       ` Tetsuo Handa
2015-10-28 22:32         ` Christoph Lameter
2015-10-29  2:24         ` Tejun Heo
2015-10-29  3:08           ` Tejun Heo
2015-10-30  1:01             ` Christoph Lameter
2015-10-31  1:15               ` Tejun Heo
2015-10-31  2:43               ` Tetsuo Handa
2015-11-02 16:12                 ` Christoph Lameter
2015-11-02 16:52                   ` Tetsuo Handa
2015-11-02 18:10                     ` Christoph Lameter
2015-11-02 19:11                       ` Tejun Heo
2015-11-06 11:28                   ` Tetsuo Handa
2015-11-06 12:56                     ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).