All of lore.kernel.org
 help / color / mirror / Atom feed
* mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2015-12-19  0:33 ` Sasha Levin
  0 siblings, 0 replies; 114+ messages in thread
From: Sasha Levin @ 2015-12-19  0:33 UTC (permalink / raw)
  To: Christoph Lameter, Michal Hocko; +Cc: LKML, linux-mm

Hi all,

I've started seeing the following in the latest -next kernel.

[  531.127489] kernel BUG at mm/vmstat.c:1408!
[  531.128157] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[  531.128872] Modules linked in:
[  531.129324] CPU: 6 PID: 407 Comm: kworker/6:1 Not tainted 4.4.0-rc5-next-20151218-sasha-00021-gaba8d84-dirty #2750
[  531.130939] Workqueue: vmstat vmstat_update
[  531.131741] task: ffff880204070000 ti: ffff880204078000 task.ti: ffff880204078000
[  531.133189] RIP: vmstat_update (mm/vmstat.c:1408)
[  531.134466] RSP: 0018:ffff88020407fbf8  EFLAGS: 00010293
[  531.135132] RAX: 0000000000000006 RBX: ffff8800418e2fd8 RCX: 0000000000000000
[  531.135995] RDX: 0000000000000007 RSI: ffffffff8c0982a0 RDI: ffffffff9b8bd6e4
[  531.137475] RBP: ffff88020407fc18 R08: 0000000000000000 R09: ffff880204070230
[  531.138304] R10: ffffffff8c0982a0 R11: 00000000e272b4f2 R12: ffff880204c1bf60
[  531.139329] R13: ffff880204ab09c8 R14: ffff880204ab09b8 R15: ffff880204ab09b0
[  531.140261] FS:  0000000000000000(0000) GS:ffff880204c00000(0000) knlGS:0000000000000000
[  531.141218] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  531.142036] CR2: 00007f039a8c1944 CR3: 000000000ea28000 CR4: 00000000000006a0
[  531.142752] Stack:
[  531.142963]  ffff880204c21000 ffff880204c21000 ffff880204c1bf60 ffff880204ab09c8
[  531.144095]  ffff88020407fd40 ffffffff813a8fea 0000000041b58ab3 ffffffff8e667cdb
[  531.145258]  ffff880204ab09f8 ffff880204c1bf68 ffff880204ab09c0 ffff880200000000
[  531.146475] Call Trace:
[  531.147037] process_one_work (./arch/x86/include/asm/preempt.h:22 kernel/workqueue.c:2045)
[  531.150790] worker_thread (include/linux/compiler.h:218 include/linux/list.h:206 kernel/workqueue.c:2171)
[  531.155176] kthread (kernel/kthread.c:209)
[  531.158941] ret_from_fork (arch/x86/entry/entry_64.S:469)
[ 531.160654] Code: 75 1e be 79 00 00 00 48 c7 c7 80 0f 10 8c 89 45 e4 e8 cd 92 cd ff 8b 45 e4 c6 05 e1 c4 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 48 c7 c7 c0 f1 47 90 e8 3d 03 ae 01 48 83 c4 08 5b 41 5c
All code
========
   0:   75 1e                   jne    0x20
   2:   be 79 00 00 00          mov    $0x79,%esi
   7:   48 c7 c7 80 0f 10 8c    mov    $0xffffffff8c100f80,%rdi
   e:   89 45 e4                mov    %eax,-0x1c(%rbp)
  11:   e8 cd 92 cd ff          callq  0xffffffffffcd92e3
  16:   8b 45 e4                mov    -0x1c(%rbp),%eax
  19:   c6 05 e1 c4 13 1a 01    movb   $0x1,0x1a13c4e1(%rip)        # 0x1a13c501
  20:   89 c0                   mov    %eax,%eax
  22:   f0 48 0f ab 03          lock bts %rax,(%rbx)
  27:   72 02                   jb     0x2b
  29:   eb 0e                   jmp    0x39
  2b:*  0f 0b                   ud2             <-- trapping instruction
  2d:   48 c7 c7 c0 f1 47 90    mov    $0xffffffff9047f1c0,%rdi
  34:   e8 3d 03 ae 01          callq  0x1ae0376
  39:   48 83 c4 08             add    $0x8,%rsp
  3d:   5b                      pop    %rbx
  3e:   41 5c                   pop    %r12
        ...

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2
   2:   48 c7 c7 c0 f1 47 90    mov    $0xffffffff9047f1c0,%rdi
   9:   e8 3d 03 ae 01          callq  0x1ae034b
   e:   48 83 c4 08             add    $0x8,%rsp
  12:   5b                      pop    %rbx
  13:   41 5c                   pop    %r12
        ...
[  531.164630] RIP vmstat_update (mm/vmstat.c:1408)
[  531.165523]  RSP <ffff88020407fbf8>


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 114+ messages in thread

* mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2015-12-19  0:33 ` Sasha Levin
  0 siblings, 0 replies; 114+ messages in thread
From: Sasha Levin @ 2015-12-19  0:33 UTC (permalink / raw)
  To: Christoph Lameter, Michal Hocko; +Cc: LKML, linux-mm

Hi all,

I've started seeing the following in the latest -next kernel.

[  531.127489] kernel BUG at mm/vmstat.c:1408!
[  531.128157] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[  531.128872] Modules linked in:
[  531.129324] CPU: 6 PID: 407 Comm: kworker/6:1 Not tainted 4.4.0-rc5-next-20151218-sasha-00021-gaba8d84-dirty #2750
[  531.130939] Workqueue: vmstat vmstat_update
[  531.131741] task: ffff880204070000 ti: ffff880204078000 task.ti: ffff880204078000
[  531.133189] RIP: vmstat_update (mm/vmstat.c:1408)
[  531.134466] RSP: 0018:ffff88020407fbf8  EFLAGS: 00010293
[  531.135132] RAX: 0000000000000006 RBX: ffff8800418e2fd8 RCX: 0000000000000000
[  531.135995] RDX: 0000000000000007 RSI: ffffffff8c0982a0 RDI: ffffffff9b8bd6e4
[  531.137475] RBP: ffff88020407fc18 R08: 0000000000000000 R09: ffff880204070230
[  531.138304] R10: ffffffff8c0982a0 R11: 00000000e272b4f2 R12: ffff880204c1bf60
[  531.139329] R13: ffff880204ab09c8 R14: ffff880204ab09b8 R15: ffff880204ab09b0
[  531.140261] FS:  0000000000000000(0000) GS:ffff880204c00000(0000) knlGS:0000000000000000
[  531.141218] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  531.142036] CR2: 00007f039a8c1944 CR3: 000000000ea28000 CR4: 00000000000006a0
[  531.142752] Stack:
[  531.142963]  ffff880204c21000 ffff880204c21000 ffff880204c1bf60 ffff880204ab09c8
[  531.144095]  ffff88020407fd40 ffffffff813a8fea 0000000041b58ab3 ffffffff8e667cdb
[  531.145258]  ffff880204ab09f8 ffff880204c1bf68 ffff880204ab09c0 ffff880200000000
[  531.146475] Call Trace:
[  531.147037] process_one_work (./arch/x86/include/asm/preempt.h:22 kernel/workqueue.c:2045)
[  531.150790] worker_thread (include/linux/compiler.h:218 include/linux/list.h:206 kernel/workqueue.c:2171)
[  531.155176] kthread (kernel/kthread.c:209)
[  531.158941] ret_from_fork (arch/x86/entry/entry_64.S:469)
[ 531.160654] Code: 75 1e be 79 00 00 00 48 c7 c7 80 0f 10 8c 89 45 e4 e8 cd 92 cd ff 8b 45 e4 c6 05 e1 c4 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 48 c7 c7 c0 f1 47 90 e8 3d 03 ae 01 48 83 c4 08 5b 41 5c
All code
========
   0:   75 1e                   jne    0x20
   2:   be 79 00 00 00          mov    $0x79,%esi
   7:   48 c7 c7 80 0f 10 8c    mov    $0xffffffff8c100f80,%rdi
   e:   89 45 e4                mov    %eax,-0x1c(%rbp)
  11:   e8 cd 92 cd ff          callq  0xffffffffffcd92e3
  16:   8b 45 e4                mov    -0x1c(%rbp),%eax
  19:   c6 05 e1 c4 13 1a 01    movb   $0x1,0x1a13c4e1(%rip)        # 0x1a13c501
  20:   89 c0                   mov    %eax,%eax
  22:   f0 48 0f ab 03          lock bts %rax,(%rbx)
  27:   72 02                   jb     0x2b
  29:   eb 0e                   jmp    0x39
  2b:*  0f 0b                   ud2             <-- trapping instruction
  2d:   48 c7 c7 c0 f1 47 90    mov    $0xffffffff9047f1c0,%rdi
  34:   e8 3d 03 ae 01          callq  0x1ae0376
  39:   48 83 c4 08             add    $0x8,%rsp
  3d:   5b                      pop    %rbx
  3e:   41 5c                   pop    %r12
        ...

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2
   2:   48 c7 c7 c0 f1 47 90    mov    $0xffffffff9047f1c0,%rdi
   9:   e8 3d 03 ae 01          callq  0x1ae034b
   e:   48 83 c4 08             add    $0x8,%rsp
  12:   5b                      pop    %rbx
  13:   41 5c                   pop    %r12
        ...
[  531.164630] RIP vmstat_update (mm/vmstat.c:1408)
[  531.165523]  RSP <ffff88020407fbf8>


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2015-12-19  0:33 ` Sasha Levin
@ 2015-12-21 13:08   ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2015-12-21 13:08 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm

On Fri, 18 Dec 2015, Sasha Levin wrote:

> [  531.164630] RIP vmstat_update (mm/vmstat.c:1408)

Hmmm.. Yes we need to fold the diffs first before disabling the timer
otherwise the shepherd task may intervene.

Does this patch fix it?


Subject: quiet_vmstat: Avoid race with shepherd by folding counters first

We need to fold the counters first otherwise the shepherd task may
remotely reactivate the vmstat worker.

This also avoids the strange loop. Nothing can really increase the
counters at that point since we are in the cpu idle loop. So
folding the counters once is enough. Cancelling work that does
not exist is fine too so just avoid the branches completely.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1419,11 +1419,9 @@ void quiet_vmstat(void)
 	if (system_state != SYSTEM_RUNNING)
 		return;

-	do {
-		if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
-			cancel_delayed_work(this_cpu_ptr(&vmstat_work));
-
-	} while (refresh_cpu_vm_stats(false));
+	refresh_cpu_vm_stats(false);
+	cancel_delayed_work(this_cpu_ptr(&vmstat_work));
+	cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
 }

 /*

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2015-12-21 13:08   ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2015-12-21 13:08 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm

On Fri, 18 Dec 2015, Sasha Levin wrote:

> [  531.164630] RIP vmstat_update (mm/vmstat.c:1408)

Hmmm.. Yes we need to fold the diffs first before disabling the timer
otherwise the shepherd task may intervene.

Does this patch fix it?


Subject: quiet_vmstat: Avoid race with shepherd by folding counters first

We need to fold the counters first otherwise the shepherd task may
remotely reactivate the vmstat worker.

This also avoids the strange loop. Nothing can really increase the
counters at that point since we are in the cpu idle loop. So
folding the counters once is enough. Cancelling work that does
not exist is fine too so just avoid the branches completely.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1419,11 +1419,9 @@ void quiet_vmstat(void)
 	if (system_state != SYSTEM_RUNNING)
 		return;

-	do {
-		if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
-			cancel_delayed_work(this_cpu_ptr(&vmstat_work));
-
-	} while (refresh_cpu_vm_stats(false));
+	refresh_cpu_vm_stats(false);
+	cancel_delayed_work(this_cpu_ptr(&vmstat_work));
+	cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
 }

 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2015-12-21 13:08   ` Christoph Lameter
@ 2015-12-21 20:28     ` Sasha Levin
  -1 siblings, 0 replies; 114+ messages in thread
From: Sasha Levin @ 2015-12-21 20:28 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm

On 12/21/2015 08:08 AM, Christoph Lameter wrote:
> On Fri, 18 Dec 2015, Sasha Levin wrote:
> 
>> > [  531.164630] RIP vmstat_update (mm/vmstat.c:1408)
> Hmmm.. Yes we need to fold the diffs first before disabling the timer
> otherwise the shepherd task may intervene.
> 
> Does this patch fix it?

It didn't. With the patch I'm still seeing:

[ 1997.674112] kernel BUG at mm/vmstat.c:1408!
[ 1997.674930] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[ 1997.676252] Modules linked in:
[ 1997.676880] CPU: 4 PID: 14713 Comm: kworker/4:0 Not tainted 4.4.0-rc5-next-20151221-sasha-00020-g840272e-dirty #2753
[ 1997.679262] Workqueue: vmstat vmstat_update
[ 1997.680109] task: ffff88015bca0000 ti: ffff88015bcb8000 task.ti: ffff88015bcb8000
[ 1997.681279] RIP: 0010:[<ffffffffa2683c48>]  [<ffffffffa2683c48>] vmstat_update+0x178/0x1a0
[ 1997.682810] RSP: 0018:ffff88015bcbfc00  EFLAGS: 00010297
[ 1997.683611] RAX: 0000000000000004 RBX: ffff8803d2801a18 RCX: 0000000000000000
[ 1997.684689] RDX: 0000000000000007 RSI: ffffffffad098220 RDI: ffffffffbc8be724
[ 1997.685750] RBP: ffff88015bcbfc20 R08: 0000000000000000 R09: ffff88015bca0230
[ 1997.686812] R10: ffffffffad098220 R11: ffff880180a1be78 R12: ffff880180a1be60
[ 1997.688087] R13: ffff880157b27908 R14: ffff880180a21000 R15: ffff880157b27900
[ 1997.689120] FS:  0000000000000000(0000) GS:ffff880180a00000(0000) knlGS:0000000000000000
[ 1997.690290] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1997.691141] CR2: 00007f05e5690000 CR3: 0000000158699000 CR4: 00000000000006a0
[ 1997.692189] Stack:
[ 1997.692496]  ffff880180a21000 ffff880157b27918 ffff880180a1be60 ffff880157b27908
[ 1997.693689]  ffff88015bcbfd40 ffffffffa23aa40f ffffffffaf66686b ffff880157b27948
[ 1997.694851]  ffff880180a1be68 ffff880100000000 ffff880157b27910 ffff880157b27920
[ 1997.696013] Call Trace:
[ 1997.696542]  [<ffffffffa23aa40f>] process_one_work+0xacf/0x12a0
[ 1997.697516]  [<ffffffffa23a9940>] ? cancel_delayed_work_sync+0x10/0x10
[ 1997.698889]  [<fffffffface0df27>] ? __schedule+0x1127/0x14c0
[ 1997.699738]  [<ffffffffa23e11bd>] ? get_parent_ip+0xd/0x40
[ 1997.700547]  [<ffffffffa23e12d9>] ? preempt_count_add+0xe9/0x140
[ 1997.701426]  [<ffffffffa23ab898>] worker_thread+0xcb8/0x1090
[ 1997.702259]  [<ffffffffa23d2ad0>] ? load_mm_ldt+0x1f0/0x1f0
[ 1997.703084]  [<ffffffffa23d5b53>] ? update_rq_clock+0x123/0x2e0
[ 1997.703962]  [<ffffffffa23aabe0>] ? process_one_work+0x12a0/0x12a0
[ 1997.704896]  [<ffffffffa23aabe0>] ? process_one_work+0x12a0/0x12a0
[ 1997.705804]  [<ffffffffa23bf8ce>] kthread+0x31e/0x330
[ 1997.706551]  [<ffffffffa23d3195>] ? finish_task_switch+0x6c5/0x970
[ 1997.707481]  [<ffffffffa23bf5b0>] ? kthread_worker_fn+0x680/0x680
[ 1997.708374]  [<ffffffffa23bf5b0>] ? kthread_worker_fn+0x680/0x680
[ 1997.709269]  [<fffffffface1a50f>] ret_from_fork+0x3f/0x70
[ 1997.710067]  [<ffffffffa23bf5b0>] ? kthread_worker_fn+0x680/0x680
[ 1997.710964] Code: 75 1e be 79 00 00 00 48 c7 c7 40 16 10 ad 89 45 e4 e8 2d 88 cd ff 8b 45 e4 c6 05 a0 a9 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 48 c7 c7 c0 21 48 b1 e8 45 d6 ad 01 48 83 c4 08 5b 41 5c
[ 1997.714961] RIP  [<ffffffffa2683c48>] vmstat_update+0x178/0x1a0
[ 1997.715852]  RSP <ffff88015bcbfc00>


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2015-12-21 20:28     ` Sasha Levin
  0 siblings, 0 replies; 114+ messages in thread
From: Sasha Levin @ 2015-12-21 20:28 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm

On 12/21/2015 08:08 AM, Christoph Lameter wrote:
> On Fri, 18 Dec 2015, Sasha Levin wrote:
> 
>> > [  531.164630] RIP vmstat_update (mm/vmstat.c:1408)
> Hmmm.. Yes we need to fold the diffs first before disabling the timer
> otherwise the shepherd task may intervene.
> 
> Does this patch fix it?

It didn't. With the patch I'm still seeing:

[ 1997.674112] kernel BUG at mm/vmstat.c:1408!
[ 1997.674930] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[ 1997.676252] Modules linked in:
[ 1997.676880] CPU: 4 PID: 14713 Comm: kworker/4:0 Not tainted 4.4.0-rc5-next-20151221-sasha-00020-g840272e-dirty #2753
[ 1997.679262] Workqueue: vmstat vmstat_update
[ 1997.680109] task: ffff88015bca0000 ti: ffff88015bcb8000 task.ti: ffff88015bcb8000
[ 1997.681279] RIP: 0010:[<ffffffffa2683c48>]  [<ffffffffa2683c48>] vmstat_update+0x178/0x1a0
[ 1997.682810] RSP: 0018:ffff88015bcbfc00  EFLAGS: 00010297
[ 1997.683611] RAX: 0000000000000004 RBX: ffff8803d2801a18 RCX: 0000000000000000
[ 1997.684689] RDX: 0000000000000007 RSI: ffffffffad098220 RDI: ffffffffbc8be724
[ 1997.685750] RBP: ffff88015bcbfc20 R08: 0000000000000000 R09: ffff88015bca0230
[ 1997.686812] R10: ffffffffad098220 R11: ffff880180a1be78 R12: ffff880180a1be60
[ 1997.688087] R13: ffff880157b27908 R14: ffff880180a21000 R15: ffff880157b27900
[ 1997.689120] FS:  0000000000000000(0000) GS:ffff880180a00000(0000) knlGS:0000000000000000
[ 1997.690290] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1997.691141] CR2: 00007f05e5690000 CR3: 0000000158699000 CR4: 00000000000006a0
[ 1997.692189] Stack:
[ 1997.692496]  ffff880180a21000 ffff880157b27918 ffff880180a1be60 ffff880157b27908
[ 1997.693689]  ffff88015bcbfd40 ffffffffa23aa40f ffffffffaf66686b ffff880157b27948
[ 1997.694851]  ffff880180a1be68 ffff880100000000 ffff880157b27910 ffff880157b27920
[ 1997.696013] Call Trace:
[ 1997.696542]  [<ffffffffa23aa40f>] process_one_work+0xacf/0x12a0
[ 1997.697516]  [<ffffffffa23a9940>] ? cancel_delayed_work_sync+0x10/0x10
[ 1997.698889]  [<fffffffface0df27>] ? __schedule+0x1127/0x14c0
[ 1997.699738]  [<ffffffffa23e11bd>] ? get_parent_ip+0xd/0x40
[ 1997.700547]  [<ffffffffa23e12d9>] ? preempt_count_add+0xe9/0x140
[ 1997.701426]  [<ffffffffa23ab898>] worker_thread+0xcb8/0x1090
[ 1997.702259]  [<ffffffffa23d2ad0>] ? load_mm_ldt+0x1f0/0x1f0
[ 1997.703084]  [<ffffffffa23d5b53>] ? update_rq_clock+0x123/0x2e0
[ 1997.703962]  [<ffffffffa23aabe0>] ? process_one_work+0x12a0/0x12a0
[ 1997.704896]  [<ffffffffa23aabe0>] ? process_one_work+0x12a0/0x12a0
[ 1997.705804]  [<ffffffffa23bf8ce>] kthread+0x31e/0x330
[ 1997.706551]  [<ffffffffa23d3195>] ? finish_task_switch+0x6c5/0x970
[ 1997.707481]  [<ffffffffa23bf5b0>] ? kthread_worker_fn+0x680/0x680
[ 1997.708374]  [<ffffffffa23bf5b0>] ? kthread_worker_fn+0x680/0x680
[ 1997.709269]  [<fffffffface1a50f>] ret_from_fork+0x3f/0x70
[ 1997.710067]  [<ffffffffa23bf5b0>] ? kthread_worker_fn+0x680/0x680
[ 1997.710964] Code: 75 1e be 79 00 00 00 48 c7 c7 40 16 10 ad 89 45 e4 e8 2d 88 cd ff 8b 45 e4 c6 05 a0 a9 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 48 c7 c7 c0 21 48 b1 e8 45 d6 ad 01 48 83 c4 08 5b 41 5c
[ 1997.714961] RIP  [<ffffffffa2683c48>] vmstat_update+0x178/0x1a0
[ 1997.715852]  RSP <ffff88015bcbfc00>


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2015-12-21 20:28     ` Sasha Levin
@ 2015-12-21 21:07       ` Sasha Levin
  -1 siblings, 0 replies; 114+ messages in thread
From: Sasha Levin @ 2015-12-21 21:07 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm

On 12/21/2015 03:28 PM, Sasha Levin wrote:
> On 12/21/2015 08:08 AM, Christoph Lameter wrote:
>> > On Fri, 18 Dec 2015, Sasha Levin wrote:
>> > 
>>>> >> > [  531.164630] RIP vmstat_update (mm/vmstat.c:1408)
>> > Hmmm.. Yes we need to fold the diffs first before disabling the timer
>> > otherwise the shepherd task may intervene.
>> > 
>> > Does this patch fix it?
> It didn't. With the patch I'm still seeing:

I've also noticed a new warning from the workqueue code which my scripts
didn't pick up before:

[ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 stuck for 54s!
[ 3462.522041] workqueue vmstat: flags=0xc
[ 3462.527795]   pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256
[ 3462.554836]     pending: vmstat_update


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2015-12-21 21:07       ` Sasha Levin
  0 siblings, 0 replies; 114+ messages in thread
From: Sasha Levin @ 2015-12-21 21:07 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm

On 12/21/2015 03:28 PM, Sasha Levin wrote:
> On 12/21/2015 08:08 AM, Christoph Lameter wrote:
>> > On Fri, 18 Dec 2015, Sasha Levin wrote:
>> > 
>>>> >> > [  531.164630] RIP vmstat_update (mm/vmstat.c:1408)
>> > Hmmm.. Yes we need to fold the diffs first before disabling the timer
>> > otherwise the shepherd task may intervene.
>> > 
>> > Does this patch fix it?
> It didn't. With the patch I'm still seeing:

I've also noticed a new warning from the workqueue code which my scripts
didn't pick up before:

[ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 stuck for 54s!
[ 3462.522041] workqueue vmstat: flags=0xc
[ 3462.527795]   pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256
[ 3462.554836]     pending: vmstat_update


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2015-12-21 21:07       ` Sasha Levin
@ 2015-12-21 21:14         ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2015-12-21 21:14 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm

On Mon, 21 Dec 2015, Sasha Levin wrote:

> I've also noticed a new warning from the workqueue code which my scripts
> didn't pick up before:
>
> [ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 stuck for 54s!
> [ 3462.522041] workqueue vmstat: flags=0xc
> [ 3462.527795]   pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256
> [ 3462.554836]     pending: vmstat_update

Does that mean that vmstat_update locks up or something that schedules it?

Also what workload triggers the BUG()?


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2015-12-21 21:14         ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2015-12-21 21:14 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm

On Mon, 21 Dec 2015, Sasha Levin wrote:

> I've also noticed a new warning from the workqueue code which my scripts
> didn't pick up before:
>
> [ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 stuck for 54s!
> [ 3462.522041] workqueue vmstat: flags=0xc
> [ 3462.527795]   pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256
> [ 3462.554836]     pending: vmstat_update

Does that mean that vmstat_update locks up or something that schedules it?

Also what workload triggers the BUG()?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2015-12-21 21:14         ` Christoph Lameter
@ 2015-12-22 17:21           ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2015-12-22 17:21 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm

Ran this here but no errors. Need config etc to reproduce.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2015-12-22 17:21           ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2015-12-22 17:21 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm

Ran this here but no errors. Need config etc to reproduce.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2015-12-21 21:14         ` Christoph Lameter
  (?)
  (?)
@ 2015-12-24 20:14         ` Sasha Levin
  2015-12-29 17:01             ` Christoph Lameter
  2016-01-04 18:05             ` Christoph Lameter
  -1 siblings, 2 replies; 114+ messages in thread
From: Sasha Levin @ 2015-12-24 20:14 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm

[-- Attachment #1: Type: text/plain, Size: 741 bytes --]

On 12/21/2015 04:14 PM, Christoph Lameter wrote:
> On Mon, 21 Dec 2015, Sasha Levin wrote:
> 
>> > I've also noticed a new warning from the workqueue code which my scripts
>> > didn't pick up before:
>> >
>> > [ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 stuck for 54s!
>> > [ 3462.522041] workqueue vmstat: flags=0xc
>> > [ 3462.527795]   pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256
>> > [ 3462.554836]     pending: vmstat_update
> Does that mean that vmstat_update locks up or something that schedules it?

I think it means that vmstat_update didn't finish running (working).

> Also what workload triggers the BUG()?

Fuzzing with trinity inside a KVM guest. I've attached my config.


Thanks,
Sasha


[-- Attachment #2: config-sasha.gz --]
[-- Type: application/gzip, Size: 43528 bytes --]

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2015-12-24 20:14         ` Sasha Levin
@ 2015-12-29 17:01             ` Christoph Lameter
  2016-01-04 18:05             ` Christoph Lameter
  1 sibling, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2015-12-29 17:01 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm

On Thu, 24 Dec 2015, Sasha Levin wrote:

> >> > [ 3462.527795]   pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256
> >> > [ 3462.554836]     pending: vmstat_update
> > Does that mean that vmstat_update locks up or something that schedules it?
>
> I think it means that vmstat_update didn't finish running (working).

There is nothing in there that blocks.

> > Also what workload triggers the BUG()?
>
> Fuzzing with trinity inside a KVM guest. I've attached my config.

Ok will have a look.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2015-12-29 17:01             ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2015-12-29 17:01 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm

On Thu, 24 Dec 2015, Sasha Levin wrote:

> >> > [ 3462.527795]   pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256
> >> > [ 3462.554836]     pending: vmstat_update
> > Does that mean that vmstat_update locks up or something that schedules it?
>
> I think it means that vmstat_update didn't finish running (working).

There is nothing in there that blocks.

> > Also what workload triggers the BUG()?
>
> Fuzzing with trinity inside a KVM guest. I've attached my config.

Ok will have a look.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2015-12-29 17:01             ` Christoph Lameter
@ 2015-12-29 17:18               ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2015-12-29 17:18 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm

Builds with your kernel config fail with

 CHK     include/generated/bounds.h
  CHK     include/generated/asm-offsets.h
  CALL    scripts/checksyscalls.sh
  CHK     include/generated/compile.h
make[1]: *** No rule to make target 'signing_key.pem', needed by
'certs/signing_key.x509'.  Stop.
Makefile:967: recipe for target 'certs' failed
make: *** [certs] Error 2
make: *** Waiting for unfinished jobs....
  CHK     kernel/config_data.h

???


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2015-12-29 17:18               ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2015-12-29 17:18 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm

Builds with your kernel config fail with

 CHK     include/generated/bounds.h
  CHK     include/generated/asm-offsets.h
  CALL    scripts/checksyscalls.sh
  CHK     include/generated/compile.h
make[1]: *** No rule to make target 'signing_key.pem', needed by
'certs/signing_key.x509'.  Stop.
Makefile:967: recipe for target 'certs' failed
make: *** [certs] Error 2
make: *** Waiting for unfinished jobs....
  CHK     kernel/config_data.h

???

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2015-12-24 20:14         ` Sasha Levin
@ 2016-01-04 18:05             ` Christoph Lameter
  2016-01-04 18:05             ` Christoph Lameter
  1 sibling, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-04 18:05 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm

On Thu, 24 Dec 2015, Sasha Levin wrote:

> > Also what workload triggers the BUG()?
>
> Fuzzing with trinity inside a KVM guest. I've attached my config.

Ok build and bootup works fine after fix from Tetsuo to config. Does not
like my initrd it seems. Is there a root with the tools available somehow?


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-04 18:05             ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-04 18:05 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm

On Thu, 24 Dec 2015, Sasha Levin wrote:

> > Also what workload triggers the BUG()?
>
> Fuzzing with trinity inside a KVM guest. I've attached my config.

Ok build and bootup works fine after fix from Tetsuo to config. Does not
like my initrd it seems. Is there a root with the tools available somehow?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-04 18:05             ` Christoph Lameter
@ 2016-01-04 18:46               ` Sasha Levin
  -1 siblings, 0 replies; 114+ messages in thread
From: Sasha Levin @ 2016-01-04 18:46 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm

On 01/04/2016 01:05 PM, Christoph Lameter wrote:
> On Thu, 24 Dec 2015, Sasha Levin wrote:
> 
>>> Also what workload triggers the BUG()?
>>
>> Fuzzing with trinity inside a KVM guest. I've attached my config.
> 
> Ok build and bootup works fine after fix from Tetsuo to config. Does not
> like my initrd it seems. Is there a root with the tools available somehow?

Will is hosting a stand-alone version here: git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-04 18:46               ` Sasha Levin
  0 siblings, 0 replies; 114+ messages in thread
From: Sasha Levin @ 2016-01-04 18:46 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm

On 01/04/2016 01:05 PM, Christoph Lameter wrote:
> On Thu, 24 Dec 2015, Sasha Levin wrote:
> 
>>> Also what workload triggers the BUG()?
>>
>> Fuzzing with trinity inside a KVM guest. I've attached my config.
> 
> Ok build and bootup works fine after fix from Tetsuo to config. Does not
> like my initrd it seems. Is there a root with the tools available somehow?

Will is hosting a stand-alone version here: git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2015-12-21 13:08   ` Christoph Lameter
@ 2016-01-12 11:31     ` Shiraz Hashim
  -1 siblings, 0 replies; 114+ messages in thread
From: Shiraz Hashim @ 2016-01-12 11:31 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm

On Mon, Dec 21, 2015 at 6:38 PM, Christoph Lameter <cl@linux.com> wrote:
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1419,11 +1419,9 @@ void quiet_vmstat(void)
>         if (system_state != SYSTEM_RUNNING)
>                 return;
>
> -       do {
> -               if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
> -                       cancel_delayed_work(this_cpu_ptr(&vmstat_work));
> -
> -       } while (refresh_cpu_vm_stats(false));
> +       refresh_cpu_vm_stats(false);
> +       cancel_delayed_work(this_cpu_ptr(&vmstat_work));
>

shouldn't this be cancel_delayed_work_sync ?

> +       cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
>

else ongoing vmstat_update can again encounter cpu_stat_off set ?

>  }
>

-- 
regards
Shiraz Hashim

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-12 11:31     ` Shiraz Hashim
  0 siblings, 0 replies; 114+ messages in thread
From: Shiraz Hashim @ 2016-01-12 11:31 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm

On Mon, Dec 21, 2015 at 6:38 PM, Christoph Lameter <cl@linux.com> wrote:
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1419,11 +1419,9 @@ void quiet_vmstat(void)
>         if (system_state != SYSTEM_RUNNING)
>                 return;
>
> -       do {
> -               if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
> -                       cancel_delayed_work(this_cpu_ptr(&vmstat_work));
> -
> -       } while (refresh_cpu_vm_stats(false));
> +       refresh_cpu_vm_stats(false);
> +       cancel_delayed_work(this_cpu_ptr(&vmstat_work));
>

shouldn't this be cancel_delayed_work_sync ?

> +       cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
>

else ongoing vmstat_update can again encounter cpu_stat_off set ?

>  }
>

-- 
regards
Shiraz Hashim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-12 11:31     ` Shiraz Hashim
@ 2016-01-12 12:23       ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-12 12:23 UTC (permalink / raw)
  To: Shiraz Hashim; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm

On Tue, 12 Jan 2016, Shiraz Hashim wrote:

> > +       refresh_cpu_vm_stats(false);
> > +       cancel_delayed_work(this_cpu_ptr(&vmstat_work));
> >
>
> shouldn't this be cancel_delayed_work_sync ?

Hmmm... This is executed with preemption off and the work is on the same
cpu. If it would be able to run concurrently then we would need this.

Ok but it could run from the timer interrupt if that is still on and
occuring shortly before we go idle. Guess this needs to be similar to
the code we execute on cpu down in the vmstat notifiers (see
vmstat_cpuup_callback).

Does this fix it? I have not been able to reproduce the issue so far.

Patch against -next.



Subject: vmstat: Use delayed work_sync and avoid loop.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1419,11 +1419,9 @@ void quiet_vmstat(void)
 	if (system_state != SYSTEM_RUNNING)
 		return;

-	do {
-		if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
-			cancel_delayed_work(this_cpu_ptr(&vmstat_work));
-
-	} while (refresh_cpu_vm_stats(false));
+	refresh_cpu_vm_stats(false);
+	cancel_delayed_work_sync(this_cpu_ptr(&vmstat_work));
+	cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
 }

 /*

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-12 12:23       ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-12 12:23 UTC (permalink / raw)
  To: Shiraz Hashim; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm

On Tue, 12 Jan 2016, Shiraz Hashim wrote:

> > +       refresh_cpu_vm_stats(false);
> > +       cancel_delayed_work(this_cpu_ptr(&vmstat_work));
> >
>
> shouldn't this be cancel_delayed_work_sync ?

Hmmm... This is executed with preemption off and the work is on the same
cpu. If it would be able to run concurrently then we would need this.

Ok but it could run from the timer interrupt if that is still on and
occuring shortly before we go idle. Guess this needs to be similar to
the code we execute on cpu down in the vmstat notifiers (see
vmstat_cpuup_callback).

Does this fix it? I have not been able to reproduce the issue so far.

Patch against -next.



Subject: vmstat: Use delayed work_sync and avoid loop.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1419,11 +1419,9 @@ void quiet_vmstat(void)
 	if (system_state != SYSTEM_RUNNING)
 		return;

-	do {
-		if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
-			cancel_delayed_work(this_cpu_ptr(&vmstat_work));
-
-	} while (refresh_cpu_vm_stats(false));
+	refresh_cpu_vm_stats(false);
+	cancel_delayed_work_sync(this_cpu_ptr(&vmstat_work));
+	cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
 }

 /*




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-12 12:23       ` Christoph Lameter
@ 2016-01-12 12:27         ` Shiraz Hashim
  -1 siblings, 0 replies; 114+ messages in thread
From: Shiraz Hashim @ 2016-01-12 12:27 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm

On Tue, Jan 12, 2016 at 5:53 PM, Christoph Lameter <cl@linux.com> wrote:
> Does this fix it? I have not been able to reproduce the issue so far.

I too am not able to reproduce. Was just thinking what else can go
wrong in Sasha's setup.

-- 
regards
Shiraz Hashim

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-12 12:27         ` Shiraz Hashim
  0 siblings, 0 replies; 114+ messages in thread
From: Shiraz Hashim @ 2016-01-12 12:27 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm

On Tue, Jan 12, 2016 at 5:53 PM, Christoph Lameter <cl@linux.com> wrote:
> Does this fix it? I have not been able to reproduce the issue so far.

I too am not able to reproduce. Was just thinking what else can go
wrong in Sasha's setup.

-- 
regards
Shiraz Hashim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-12 12:23       ` Christoph Lameter
@ 2016-01-13 11:36         ` Shiraz Hashim
  -1 siblings, 0 replies; 114+ messages in thread
From: Shiraz Hashim @ 2016-01-13 11:36 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm

Hi Sasha,

On Tue, Jan 12, 2016 at 5:53 PM, Christoph Lameter <cl@linux.com> wrote:
> On Tue, 12 Jan 2016, Shiraz Hashim wrote:
>
>> > +       refresh_cpu_vm_stats(false);
>> > +       cancel_delayed_work(this_cpu_ptr(&vmstat_work));
>> >
>>
>> shouldn't this be cancel_delayed_work_sync ?
>
> Hmmm... This is executed with preemption off and the work is on the same
> cpu. If it would be able to run concurrently then we would need this.
>
> Ok but it could run from the timer interrupt if that is still on and
> occuring shortly before we go idle. Guess this needs to be similar to
> the code we execute on cpu down in the vmstat notifiers (see
> vmstat_cpuup_callback).
>
> Does this fix it? I have not been able to reproduce the issue so far.
>
> Patch against -next.
>
>
>
> Subject: vmstat: Use delayed work_sync and avoid loop.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1419,11 +1419,9 @@ void quiet_vmstat(void)
>         if (system_state != SYSTEM_RUNNING)
>                 return;
>
> -       do {
> -               if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
> -                       cancel_delayed_work(this_cpu_ptr(&vmstat_work));
> -
> -       } while (refresh_cpu_vm_stats(false));
> +       refresh_cpu_vm_stats(false);
> +       cancel_delayed_work_sync(this_cpu_ptr(&vmstat_work));
> +       cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
>  }
>
>  /*

Can you please give it a try, seems it is reproducing easily at your end.

-- 
regards
Shiraz Hashim

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-13 11:36         ` Shiraz Hashim
  0 siblings, 0 replies; 114+ messages in thread
From: Shiraz Hashim @ 2016-01-13 11:36 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm

Hi Sasha,

On Tue, Jan 12, 2016 at 5:53 PM, Christoph Lameter <cl@linux.com> wrote:
> On Tue, 12 Jan 2016, Shiraz Hashim wrote:
>
>> > +       refresh_cpu_vm_stats(false);
>> > +       cancel_delayed_work(this_cpu_ptr(&vmstat_work));
>> >
>>
>> shouldn't this be cancel_delayed_work_sync ?
>
> Hmmm... This is executed with preemption off and the work is on the same
> cpu. If it would be able to run concurrently then we would need this.
>
> Ok but it could run from the timer interrupt if that is still on and
> occuring shortly before we go idle. Guess this needs to be similar to
> the code we execute on cpu down in the vmstat notifiers (see
> vmstat_cpuup_callback).
>
> Does this fix it? I have not been able to reproduce the issue so far.
>
> Patch against -next.
>
>
>
> Subject: vmstat: Use delayed work_sync and avoid loop.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1419,11 +1419,9 @@ void quiet_vmstat(void)
>         if (system_state != SYSTEM_RUNNING)
>                 return;
>
> -       do {
> -               if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
> -                       cancel_delayed_work(this_cpu_ptr(&vmstat_work));
> -
> -       } while (refresh_cpu_vm_stats(false));
> +       refresh_cpu_vm_stats(false);
> +       cancel_delayed_work_sync(this_cpu_ptr(&vmstat_work));
> +       cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
>  }
>
>  /*

Can you please give it a try, seems it is reproducing easily at your end.

-- 
regards
Shiraz Hashim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-13 11:36         ` Shiraz Hashim
@ 2016-01-13 12:32           ` Shiraz Hashim
  -1 siblings, 0 replies; 114+ messages in thread
From: Shiraz Hashim @ 2016-01-13 12:32 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm

On Wed, Jan 13, 2016 at 5:06 PM, Shiraz Hashim
<shiraz.linux.kernel@gmail.com> wrote:
>
> Can you please give it a try, seems it is reproducing easily at your end.
>

Ahh!, it cannot be done as it is called with preemption disabled.

[    8.131226] Process swapper/4 (pid: 0, stack limit = 0xffffffc06fc18058)
[    8.137908] Call trace:
[    8.140344] [<ffffffc0000c19cc>] __might_sleep+0x15c/0x16c
[    8.145812] [<ffffffc0000b3d64>] flush_work+0x3c/0x190
[    8.150933] [<ffffffc0000b4450>] __cancel_work_timer+0x138/0x1e4
[    8.156922] [<ffffffc0000b45a8>] cancel_delayed_work_sync+0xc/0x18
[    8.163087] [<ffffffc0001782dc>] quiet_vmstat+0x3c/0x60
[    8.168294] [<ffffffc0000dd2f0>] cpu_startup_entry+0x2c/0x330
[    8.174024] [<ffffffc000090b24>] secondary_start_kernel+0x124/0x134

-- 
regards
Shiraz Hashim

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-13 12:32           ` Shiraz Hashim
  0 siblings, 0 replies; 114+ messages in thread
From: Shiraz Hashim @ 2016-01-13 12:32 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm

On Wed, Jan 13, 2016 at 5:06 PM, Shiraz Hashim
<shiraz.linux.kernel@gmail.com> wrote:
>
> Can you please give it a try, seems it is reproducing easily at your end.
>

Ahh!, it cannot be done as it is called with preemption disabled.

[    8.131226] Process swapper/4 (pid: 0, stack limit = 0xffffffc06fc18058)
[    8.137908] Call trace:
[    8.140344] [<ffffffc0000c19cc>] __might_sleep+0x15c/0x16c
[    8.145812] [<ffffffc0000b3d64>] flush_work+0x3c/0x190
[    8.150933] [<ffffffc0000b4450>] __cancel_work_timer+0x138/0x1e4
[    8.156922] [<ffffffc0000b45a8>] cancel_delayed_work_sync+0xc/0x18
[    8.163087] [<ffffffc0001782dc>] quiet_vmstat+0x3c/0x60
[    8.168294] [<ffffffc0000dd2f0>] cpu_startup_entry+0x2c/0x330
[    8.174024] [<ffffffc000090b24>] secondary_start_kernel+0x124/0x134

-- 
regards
Shiraz Hashim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-13 11:36         ` Shiraz Hashim
@ 2016-01-14 21:06           ` Sasha Levin
  -1 siblings, 0 replies; 114+ messages in thread
From: Sasha Levin @ 2016-01-14 21:06 UTC (permalink / raw)
  To: Shiraz Hashim, Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm

On 01/13/2016 06:36 AM, Shiraz Hashim wrote:
> Hi Sasha,
> 
> On Tue, Jan 12, 2016 at 5:53 PM, Christoph Lameter <cl@linux.com> wrote:
>> On Tue, 12 Jan 2016, Shiraz Hashim wrote:
>>
>>>> +       refresh_cpu_vm_stats(false);
>>>> +       cancel_delayed_work(this_cpu_ptr(&vmstat_work));
>>>>
>>>
>>> shouldn't this be cancel_delayed_work_sync ?
>>
>> Hmmm... This is executed with preemption off and the work is on the same
>> cpu. If it would be able to run concurrently then we would need this.
>>
>> Ok but it could run from the timer interrupt if that is still on and
>> occuring shortly before we go idle. Guess this needs to be similar to
>> the code we execute on cpu down in the vmstat notifiers (see
>> vmstat_cpuup_callback).
>>
>> Does this fix it? I have not been able to reproduce the issue so far.
>>
>> Patch against -next.
>>
>>
>>
>> Subject: vmstat: Use delayed work_sync and avoid loop.
>>
>> Signed-off-by: Christoph Lameter <cl@linux.com>
>>
>> Index: linux/mm/vmstat.c
>> ===================================================================
>> --- linux.orig/mm/vmstat.c
>> +++ linux/mm/vmstat.c
>> @@ -1419,11 +1419,9 @@ void quiet_vmstat(void)
>>         if (system_state != SYSTEM_RUNNING)
>>                 return;
>>
>> -       do {
>> -               if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
>> -                       cancel_delayed_work(this_cpu_ptr(&vmstat_work));
>> -
>> -       } while (refresh_cpu_vm_stats(false));
>> +       refresh_cpu_vm_stats(false);
>> +       cancel_delayed_work_sync(this_cpu_ptr(&vmstat_work));
>> +       cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
>>  }
>>
>>  /*
> 
> Can you please give it a try, seems it is reproducing easily at your end.
> 

I'm seeing:

[3637853.902081] BUG: sleeping function called from invalid context at kernel/workqueue.c:2725
[3637853.904291] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
[3637853.905488] no locks held by swapper/0/0.
[3637853.906387] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-next-20160114-sasha-00021-gf1273d1-dirty #2797
[3637853.908369]  1ffffffff5dc0f42 0f76d753b1b5b625 ffffffffaee07a90 ffffffffa2433eee
[3637853.909561]  0000000041b58ab3 ffffffffae960c30 ffffffffa2433e26 ffffffffaee49bc0
[3637853.911601]  0000000000000000 0000000000000000 0000000000000000 ffffffffaee07a78
[3637853.913281] Call Trace:
[3637853.913775]  [<ffffffffa2433eee>] dump_stack+0xc8/0x12a
[3637853.914649]  [<ffffffffa2433e26>] ? _atomic_dec_and_lock+0x106/0x106
[3637853.915824]  [<ffffffffa0440125>] ___might_sleep+0x2c5/0x480
[3637853.917056]  [<ffffffffa044033b>] __might_sleep+0x5b/0x260
[3637853.918310]  [<ffffffffa040a890>] flush_work+0xe0/0x820
[3637853.919270]  [<ffffffffa040a7b0>] ? __queue_delayed_work+0x460/0x460
[3637853.921561]  [<ffffffffa040a7b0>] ? __queue_delayed_work+0x460/0x460
[3637853.922898]  [<ffffffffa054d367>] ? del_timer+0x107/0x170
[3637853.924395]  [<ffffffffa054d260>] ? lock_timer_base+0x220/0x220
[3637853.925827]  [<ffffffffa0411445>] ? try_to_grab_pending+0x115/0x680
[3637853.926924]  [<ffffffffa0411b87>] ? __cancel_work_timer+0x1d7/0x5f0
[3637853.928215]  [<ffffffffa04d4b58>] ? mark_held_locks+0x1a8/0x220
[3637853.929075]  [<ffffffffa0411bdb>] __cancel_work_timer+0x22b/0x5f0
[3637853.930765]  [<ffffffffa04119b0>] ? try_to_grab_pending+0x680/0x680
[3637853.931642]  [<ffffffffa075fb60>] ? fill_contig_page_info+0x2e0/0x2e0
[3637853.932774]  [<ffffffffa0411fda>] cancel_delayed_work_sync+0x1a/0x20
[3637853.933897]  [<ffffffffa0766754>] quiet_vmstat+0x74/0x140
[3637853.934991]  [<ffffffffa04c20eb>] cpu_startup_entry+0x12b/0x6d0
[3637853.936174]  [<ffffffffa04c1fc0>] ? call_cpuidle+0x160/0x160
[3637853.937420]  [<ffffffffabfe3816>] rest_init+0x1d6/0x1e0
[3637853.938799]  [<ffffffffbb7e4c2a>] start_kernel+0x66c/0x6a6
[3637853.940742]  [<ffffffffbb7e45be>] ? thread_info_cache_init+0xb/0xb
[3637853.942416]  [<ffffffffbb9b0ae7>] ? memblock_reserve+0x59/0x5e
[3637853.943880]  [<ffffffffbb7e3120>] ? early_idt_handler_array+0x120/0x120
[3637853.945161]  [<ffffffffbb7e33c4>] x86_64_start_reservations+0x2a/0x2c
[3637853.946005]  [<ffffffffbb7e351d>] x86_64_start_kernel+0x157/0x17a

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-14 21:06           ` Sasha Levin
  0 siblings, 0 replies; 114+ messages in thread
From: Sasha Levin @ 2016-01-14 21:06 UTC (permalink / raw)
  To: Shiraz Hashim, Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm

On 01/13/2016 06:36 AM, Shiraz Hashim wrote:
> Hi Sasha,
> 
> On Tue, Jan 12, 2016 at 5:53 PM, Christoph Lameter <cl@linux.com> wrote:
>> On Tue, 12 Jan 2016, Shiraz Hashim wrote:
>>
>>>> +       refresh_cpu_vm_stats(false);
>>>> +       cancel_delayed_work(this_cpu_ptr(&vmstat_work));
>>>>
>>>
>>> shouldn't this be cancel_delayed_work_sync ?
>>
>> Hmmm... This is executed with preemption off and the work is on the same
>> cpu. If it would be able to run concurrently then we would need this.
>>
>> Ok but it could run from the timer interrupt if that is still on and
>> occuring shortly before we go idle. Guess this needs to be similar to
>> the code we execute on cpu down in the vmstat notifiers (see
>> vmstat_cpuup_callback).
>>
>> Does this fix it? I have not been able to reproduce the issue so far.
>>
>> Patch against -next.
>>
>>
>>
>> Subject: vmstat: Use delayed work_sync and avoid loop.
>>
>> Signed-off-by: Christoph Lameter <cl@linux.com>
>>
>> Index: linux/mm/vmstat.c
>> ===================================================================
>> --- linux.orig/mm/vmstat.c
>> +++ linux/mm/vmstat.c
>> @@ -1419,11 +1419,9 @@ void quiet_vmstat(void)
>>         if (system_state != SYSTEM_RUNNING)
>>                 return;
>>
>> -       do {
>> -               if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
>> -                       cancel_delayed_work(this_cpu_ptr(&vmstat_work));
>> -
>> -       } while (refresh_cpu_vm_stats(false));
>> +       refresh_cpu_vm_stats(false);
>> +       cancel_delayed_work_sync(this_cpu_ptr(&vmstat_work));
>> +       cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
>>  }
>>
>>  /*
> 
> Can you please give it a try, seems it is reproducing easily at your end.
> 

I'm seeing:

[3637853.902081] BUG: sleeping function called from invalid context at kernel/workqueue.c:2725
[3637853.904291] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
[3637853.905488] no locks held by swapper/0/0.
[3637853.906387] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-next-20160114-sasha-00021-gf1273d1-dirty #2797
[3637853.908369]  1ffffffff5dc0f42 0f76d753b1b5b625 ffffffffaee07a90 ffffffffa2433eee
[3637853.909561]  0000000041b58ab3 ffffffffae960c30 ffffffffa2433e26 ffffffffaee49bc0
[3637853.911601]  0000000000000000 0000000000000000 0000000000000000 ffffffffaee07a78
[3637853.913281] Call Trace:
[3637853.913775]  [<ffffffffa2433eee>] dump_stack+0xc8/0x12a
[3637853.914649]  [<ffffffffa2433e26>] ? _atomic_dec_and_lock+0x106/0x106
[3637853.915824]  [<ffffffffa0440125>] ___might_sleep+0x2c5/0x480
[3637853.917056]  [<ffffffffa044033b>] __might_sleep+0x5b/0x260
[3637853.918310]  [<ffffffffa040a890>] flush_work+0xe0/0x820
[3637853.919270]  [<ffffffffa040a7b0>] ? __queue_delayed_work+0x460/0x460
[3637853.921561]  [<ffffffffa040a7b0>] ? __queue_delayed_work+0x460/0x460
[3637853.922898]  [<ffffffffa054d367>] ? del_timer+0x107/0x170
[3637853.924395]  [<ffffffffa054d260>] ? lock_timer_base+0x220/0x220
[3637853.925827]  [<ffffffffa0411445>] ? try_to_grab_pending+0x115/0x680
[3637853.926924]  [<ffffffffa0411b87>] ? __cancel_work_timer+0x1d7/0x5f0
[3637853.928215]  [<ffffffffa04d4b58>] ? mark_held_locks+0x1a8/0x220
[3637853.929075]  [<ffffffffa0411bdb>] __cancel_work_timer+0x22b/0x5f0
[3637853.930765]  [<ffffffffa04119b0>] ? try_to_grab_pending+0x680/0x680
[3637853.931642]  [<ffffffffa075fb60>] ? fill_contig_page_info+0x2e0/0x2e0
[3637853.932774]  [<ffffffffa0411fda>] cancel_delayed_work_sync+0x1a/0x20
[3637853.933897]  [<ffffffffa0766754>] quiet_vmstat+0x74/0x140
[3637853.934991]  [<ffffffffa04c20eb>] cpu_startup_entry+0x12b/0x6d0
[3637853.936174]  [<ffffffffa04c1fc0>] ? call_cpuidle+0x160/0x160
[3637853.937420]  [<ffffffffabfe3816>] rest_init+0x1d6/0x1e0
[3637853.938799]  [<ffffffffbb7e4c2a>] start_kernel+0x66c/0x6a6
[3637853.940742]  [<ffffffffbb7e45be>] ? thread_info_cache_init+0xb/0xb
[3637853.942416]  [<ffffffffbb9b0ae7>] ? memblock_reserve+0x59/0x5e
[3637853.943880]  [<ffffffffbb7e3120>] ? early_idt_handler_array+0x120/0x120
[3637853.945161]  [<ffffffffbb7e33c4>] x86_64_start_reservations+0x2a/0x2c
[3637853.946005]  [<ffffffffbb7e351d>] x86_64_start_kernel+0x157/0x17a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2015-12-19  0:33 ` Sasha Levin
@ 2016-01-20 14:37   ` Michal Hocko
  -1 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-20 14:37 UTC (permalink / raw)
  To: Sasha Levin, Christoph Lameter; +Cc: LKML, linux-mm, Andrew Morton

[CCing Andrew]

I am just reading through this old discussion again because "vmstat:
make vmstat_updater deferrable again and shut down on idle" which seems
to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
any follow up fix merged to linus tree

On Fri 18-12-15 19:33:07, Sasha Levin wrote:
> Hi all,
> 
> I've started seeing the following in the latest -next kernel.
> 
> [  531.127489] kernel BUG at mm/vmstat.c:1408!
> [  531.128157] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> [  531.128872] Modules linked in:
> [  531.129324] CPU: 6 PID: 407 Comm: kworker/6:1 Not tainted 4.4.0-rc5-next-20151218-sasha-00021-gaba8d84-dirty #2750
> [  531.130939] Workqueue: vmstat vmstat_update
> [  531.131741] task: ffff880204070000 ti: ffff880204078000 task.ti: ffff880204078000
> [  531.133189] RIP: vmstat_update (mm/vmstat.c:1408)
> [  531.134466] RSP: 0018:ffff88020407fbf8  EFLAGS: 00010293
> [  531.135132] RAX: 0000000000000006 RBX: ffff8800418e2fd8 RCX: 0000000000000000
> [  531.135995] RDX: 0000000000000007 RSI: ffffffff8c0982a0 RDI: ffffffff9b8bd6e4
> [  531.137475] RBP: ffff88020407fc18 R08: 0000000000000000 R09: ffff880204070230
> [  531.138304] R10: ffffffff8c0982a0 R11: 00000000e272b4f2 R12: ffff880204c1bf60
> [  531.139329] R13: ffff880204ab09c8 R14: ffff880204ab09b8 R15: ffff880204ab09b0
> [  531.140261] FS:  0000000000000000(0000) GS:ffff880204c00000(0000) knlGS:0000000000000000
> [  531.141218] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  531.142036] CR2: 00007f039a8c1944 CR3: 000000000ea28000 CR4: 00000000000006a0
> [  531.142752] Stack:
> [  531.142963]  ffff880204c21000 ffff880204c21000 ffff880204c1bf60 ffff880204ab09c8
> [  531.144095]  ffff88020407fd40 ffffffff813a8fea 0000000041b58ab3 ffffffff8e667cdb
> [  531.145258]  ffff880204ab09f8 ffff880204c1bf68 ffff880204ab09c0 ffff880200000000
> [  531.146475] Call Trace:
> [  531.147037] process_one_work (./arch/x86/include/asm/preempt.h:22 kernel/workqueue.c:2045)
> [  531.150790] worker_thread (include/linux/compiler.h:218 include/linux/list.h:206 kernel/workqueue.c:2171)
> [  531.155176] kthread (kernel/kthread.c:209)
> [  531.158941] ret_from_fork (arch/x86/entry/entry_64.S:469)
> [ 531.160654] Code: 75 1e be 79 00 00 00 48 c7 c7 80 0f 10 8c 89 45 e4 e8 cd 92 cd ff 8b 45 e4 c6 05 e1 c4 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 48 c7 c7 c0 f1 47 90 e8 3d 03 ae 01 48 83 c4 08 5b 41 5c
> All code
> ========
>    0:   75 1e                   jne    0x20
>    2:   be 79 00 00 00          mov    $0x79,%esi
>    7:   48 c7 c7 80 0f 10 8c    mov    $0xffffffff8c100f80,%rdi
>    e:   89 45 e4                mov    %eax,-0x1c(%rbp)
>   11:   e8 cd 92 cd ff          callq  0xffffffffffcd92e3
>   16:   8b 45 e4                mov    -0x1c(%rbp),%eax
>   19:   c6 05 e1 c4 13 1a 01    movb   $0x1,0x1a13c4e1(%rip)        # 0x1a13c501
>   20:   89 c0                   mov    %eax,%eax
>   22:   f0 48 0f ab 03          lock bts %rax,(%rbx)
>   27:   72 02                   jb     0x2b
>   29:   eb 0e                   jmp    0x39
>   2b:*  0f 0b                   ud2             <-- trapping instruction
>   2d:   48 c7 c7 c0 f1 47 90    mov    $0xffffffff9047f1c0,%rdi
>   34:   e8 3d 03 ae 01          callq  0x1ae0376
>   39:   48 83 c4 08             add    $0x8,%rsp
>   3d:   5b                      pop    %rbx
>   3e:   41 5c                   pop    %r12
>         ...
> 
> Code starting with the faulting instruction
> ===========================================
>    0:   0f 0b                   ud2
>    2:   48 c7 c7 c0 f1 47 90    mov    $0xffffffff9047f1c0,%rdi
>    9:   e8 3d 03 ae 01          callq  0x1ae034b
>    e:   48 83 c4 08             add    $0x8,%rsp
>   12:   5b                      pop    %rbx
>   13:   41 5c                   pop    %r12
>         ...
> [  531.164630] RIP vmstat_update (mm/vmstat.c:1408)
> [  531.165523]  RSP <ffff88020407fbf8>
> 
> 
> Thanks,
> Sasha

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-20 14:37   ` Michal Hocko
  0 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-20 14:37 UTC (permalink / raw)
  To: Sasha Levin, Christoph Lameter; +Cc: LKML, linux-mm, Andrew Morton

[CCing Andrew]

I am just reading through this old discussion again because "vmstat:
make vmstat_updater deferrable again and shut down on idle" which seems
to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
any follow up fix merged to linus tree

On Fri 18-12-15 19:33:07, Sasha Levin wrote:
> Hi all,
> 
> I've started seeing the following in the latest -next kernel.
> 
> [  531.127489] kernel BUG at mm/vmstat.c:1408!
> [  531.128157] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> [  531.128872] Modules linked in:
> [  531.129324] CPU: 6 PID: 407 Comm: kworker/6:1 Not tainted 4.4.0-rc5-next-20151218-sasha-00021-gaba8d84-dirty #2750
> [  531.130939] Workqueue: vmstat vmstat_update
> [  531.131741] task: ffff880204070000 ti: ffff880204078000 task.ti: ffff880204078000
> [  531.133189] RIP: vmstat_update (mm/vmstat.c:1408)
> [  531.134466] RSP: 0018:ffff88020407fbf8  EFLAGS: 00010293
> [  531.135132] RAX: 0000000000000006 RBX: ffff8800418e2fd8 RCX: 0000000000000000
> [  531.135995] RDX: 0000000000000007 RSI: ffffffff8c0982a0 RDI: ffffffff9b8bd6e4
> [  531.137475] RBP: ffff88020407fc18 R08: 0000000000000000 R09: ffff880204070230
> [  531.138304] R10: ffffffff8c0982a0 R11: 00000000e272b4f2 R12: ffff880204c1bf60
> [  531.139329] R13: ffff880204ab09c8 R14: ffff880204ab09b8 R15: ffff880204ab09b0
> [  531.140261] FS:  0000000000000000(0000) GS:ffff880204c00000(0000) knlGS:0000000000000000
> [  531.141218] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  531.142036] CR2: 00007f039a8c1944 CR3: 000000000ea28000 CR4: 00000000000006a0
> [  531.142752] Stack:
> [  531.142963]  ffff880204c21000 ffff880204c21000 ffff880204c1bf60 ffff880204ab09c8
> [  531.144095]  ffff88020407fd40 ffffffff813a8fea 0000000041b58ab3 ffffffff8e667cdb
> [  531.145258]  ffff880204ab09f8 ffff880204c1bf68 ffff880204ab09c0 ffff880200000000
> [  531.146475] Call Trace:
> [  531.147037] process_one_work (./arch/x86/include/asm/preempt.h:22 kernel/workqueue.c:2045)
> [  531.150790] worker_thread (include/linux/compiler.h:218 include/linux/list.h:206 kernel/workqueue.c:2171)
> [  531.155176] kthread (kernel/kthread.c:209)
> [  531.158941] ret_from_fork (arch/x86/entry/entry_64.S:469)
> [ 531.160654] Code: 75 1e be 79 00 00 00 48 c7 c7 80 0f 10 8c 89 45 e4 e8 cd 92 cd ff 8b 45 e4 c6 05 e1 c4 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 48 c7 c7 c0 f1 47 90 e8 3d 03 ae 01 48 83 c4 08 5b 41 5c
> All code
> ========
>    0:   75 1e                   jne    0x20
>    2:   be 79 00 00 00          mov    $0x79,%esi
>    7:   48 c7 c7 80 0f 10 8c    mov    $0xffffffff8c100f80,%rdi
>    e:   89 45 e4                mov    %eax,-0x1c(%rbp)
>   11:   e8 cd 92 cd ff          callq  0xffffffffffcd92e3
>   16:   8b 45 e4                mov    -0x1c(%rbp),%eax
>   19:   c6 05 e1 c4 13 1a 01    movb   $0x1,0x1a13c4e1(%rip)        # 0x1a13c501
>   20:   89 c0                   mov    %eax,%eax
>   22:   f0 48 0f ab 03          lock bts %rax,(%rbx)
>   27:   72 02                   jb     0x2b
>   29:   eb 0e                   jmp    0x39
>   2b:*  0f 0b                   ud2             <-- trapping instruction
>   2d:   48 c7 c7 c0 f1 47 90    mov    $0xffffffff9047f1c0,%rdi
>   34:   e8 3d 03 ae 01          callq  0x1ae0376
>   39:   48 83 c4 08             add    $0x8,%rsp
>   3d:   5b                      pop    %rbx
>   3e:   41 5c                   pop    %r12
>         ...
> 
> Code starting with the faulting instruction
> ===========================================
>    0:   0f 0b                   ud2
>    2:   48 c7 c7 c0 f1 47 90    mov    $0xffffffff9047f1c0,%rdi
>    9:   e8 3d 03 ae 01          callq  0x1ae034b
>    e:   48 83 c4 08             add    $0x8,%rsp
>   12:   5b                      pop    %rbx
>   13:   41 5c                   pop    %r12
>         ...
> [  531.164630] RIP vmstat_update (mm/vmstat.c:1408)
> [  531.165523]  RSP <ffff88020407fbf8>
> 
> 
> Thanks,
> Sasha

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-20 14:37   ` Michal Hocko
@ 2016-01-20 14:56     ` Sasha Levin
  -1 siblings, 0 replies; 114+ messages in thread
From: Sasha Levin @ 2016-01-20 14:56 UTC (permalink / raw)
  To: Michal Hocko, Christoph Lameter; +Cc: LKML, linux-mm, Andrew Morton

On 01/20/2016 09:37 AM, Michal Hocko wrote:
> I am just reading through this old discussion again because "vmstat:
> make vmstat_updater deferrable again and shut down on idle" which seems
> to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> any follow up fix merged to linus tree

So this isn't an "old" discussion - the bug is very much there and I can
hit it easily. As a workaround I've "disabled" vmstat.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-20 14:56     ` Sasha Levin
  0 siblings, 0 replies; 114+ messages in thread
From: Sasha Levin @ 2016-01-20 14:56 UTC (permalink / raw)
  To: Michal Hocko, Christoph Lameter; +Cc: LKML, linux-mm, Andrew Morton

On 01/20/2016 09:37 AM, Michal Hocko wrote:
> I am just reading through this old discussion again because "vmstat:
> make vmstat_updater deferrable again and shut down on idle" which seems
> to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> any follow up fix merged to linus tree

So this isn't an "old" discussion - the bug is very much there and I can
hit it easily. As a workaround I've "disabled" vmstat.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-20 14:56     ` Sasha Levin
@ 2016-01-20 15:10       ` Michal Hocko
  -1 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-20 15:10 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Christoph Lameter, LKML, linux-mm, Andrew Morton

On Wed 20-01-16 09:56:26, Sasha Levin wrote:
> On 01/20/2016 09:37 AM, Michal Hocko wrote:
> > I am just reading through this old discussion again because "vmstat:
> > make vmstat_updater deferrable again and shut down on idle" which seems
> > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> > any follow up fix merged to linus tree
> 
> So this isn't an "old" discussion - the bug is very much there and I can
> hit it easily. As a workaround I've "disabled" vmstat.

Well the report is since 18th Dec which is over month old. Should we
revert 0eb77e988032 as a pre caution and make sure this is done properly
in -mm tree. AFAIR none of the proposed fix worked without other
fallouts?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-20 15:10       ` Michal Hocko
  0 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-20 15:10 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Christoph Lameter, LKML, linux-mm, Andrew Morton

On Wed 20-01-16 09:56:26, Sasha Levin wrote:
> On 01/20/2016 09:37 AM, Michal Hocko wrote:
> > I am just reading through this old discussion again because "vmstat:
> > make vmstat_updater deferrable again and shut down on idle" which seems
> > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> > any follow up fix merged to linus tree
> 
> So this isn't an "old" discussion - the bug is very much there and I can
> hit it easily. As a workaround I've "disabled" vmstat.

Well the report is since 18th Dec which is over month old. Should we
revert 0eb77e988032 as a pre caution and make sure this is done properly
in -mm tree. AFAIR none of the proposed fix worked without other
fallouts?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-20 14:37   ` Michal Hocko
@ 2016-01-20 15:14     ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-20 15:14 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Wed, 20 Jan 2016, Michal Hocko wrote:

> [CCing Andrew]
>
> I am just reading through this old discussion again because "vmstat:
> make vmstat_updater deferrable again and shut down on idle" which seems
> to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> any follow up fix merged to linus tree

Is there any way to reproce this issue? This is running through trinity
right? Can we please get the exact syscall that causes this to occur?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-20 15:14     ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-20 15:14 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Wed, 20 Jan 2016, Michal Hocko wrote:

> [CCing Andrew]
>
> I am just reading through this old discussion again because "vmstat:
> make vmstat_updater deferrable again and shut down on idle" which seems
> to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> any follow up fix merged to linus tree

Is there any way to reproce this issue? This is running through trinity
right? Can we please get the exact syscall that causes this to occur?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-20 15:14     ` Christoph Lameter
@ 2016-01-20 15:20       ` Michal Hocko
  -1 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-20 15:20 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Wed 20-01-16 09:14:06, Christoph Lameter wrote:
> On Wed, 20 Jan 2016, Michal Hocko wrote:
> 
> > [CCing Andrew]
> >
> > I am just reading through this old discussion again because "vmstat:
> > make vmstat_updater deferrable again and shut down on idle" which seems
> > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> > any follow up fix merged to linus tree
> 
> Is there any way to reproce this issue? This is running through trinity
> right? Can we please get the exact syscall that causes this to occur?

As per the backtrace in the initial report this seems to be time
dependent as the crash happens from _kthread_ context. So it doesn't
seem to be directly related to any particular syscall.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-20 15:20       ` Michal Hocko
  0 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-20 15:20 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Wed 20-01-16 09:14:06, Christoph Lameter wrote:
> On Wed, 20 Jan 2016, Michal Hocko wrote:
> 
> > [CCing Andrew]
> >
> > I am just reading through this old discussion again because "vmstat:
> > make vmstat_updater deferrable again and shut down on idle" which seems
> > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> > any follow up fix merged to linus tree
> 
> Is there any way to reproce this issue? This is running through trinity
> right? Can we please get the exact syscall that causes this to occur?

As per the backtrace in the initial report this seems to be time
dependent as the crash happens from _kthread_ context. So it doesn't
seem to be directly related to any particular syscall.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-20 15:10       ` Michal Hocko
@ 2016-01-20 15:20         ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-20 15:20 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Wed, 20 Jan 2016, Michal Hocko wrote:

> On Wed 20-01-16 09:56:26, Sasha Levin wrote:
> > On 01/20/2016 09:37 AM, Michal Hocko wrote:
> > > I am just reading through this old discussion again because "vmstat:
> > > make vmstat_updater deferrable again and shut down on idle" which seems
> > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> > > any follow up fix merged to linus tree
> >
> > So this isn't an "old" discussion - the bug is very much there and I can
> > hit it easily. As a workaround I've "disabled" vmstat.
>
> Well the report is since 18th Dec which is over month old. Should we
> revert 0eb77e988032 as a pre caution and make sure this is done properly
> in -mm tree. AFAIR none of the proposed fix worked without other
> fallouts?

Seems that we are unable to get enough information to reproduce the issue?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-20 15:20         ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-20 15:20 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Wed, 20 Jan 2016, Michal Hocko wrote:

> On Wed 20-01-16 09:56:26, Sasha Levin wrote:
> > On 01/20/2016 09:37 AM, Michal Hocko wrote:
> > > I am just reading through this old discussion again because "vmstat:
> > > make vmstat_updater deferrable again and shut down on idle" which seems
> > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
> > > any follow up fix merged to linus tree
> >
> > So this isn't an "old" discussion - the bug is very much there and I can
> > hit it easily. As a workaround I've "disabled" vmstat.
>
> Well the report is since 18th Dec which is over month old. Should we
> revert 0eb77e988032 as a pre caution and make sure this is done properly
> in -mm tree. AFAIR none of the proposed fix worked without other
> fallouts?

Seems that we are unable to get enough information to reproduce the issue?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-20 15:20         ` Christoph Lameter
@ 2016-01-20 15:49           ` Sasha Levin
  -1 siblings, 0 replies; 114+ messages in thread
From: Sasha Levin @ 2016-01-20 15:49 UTC (permalink / raw)
  To: Christoph Lameter, Michal Hocko; +Cc: LKML, linux-mm, Andrew Morton

On 01/20/2016 10:20 AM, Christoph Lameter wrote:
> On Wed, 20 Jan 2016, Michal Hocko wrote:
> 
>> > On Wed 20-01-16 09:56:26, Sasha Levin wrote:
>>> > > On 01/20/2016 09:37 AM, Michal Hocko wrote:
>>>> > > > I am just reading through this old discussion again because "vmstat:
>>>> > > > make vmstat_updater deferrable again and shut down on idle" which seems
>>>> > > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
>>>> > > > any follow up fix merged to linus tree
>>> > >
>>> > > So this isn't an "old" discussion - the bug is very much there and I can
>>> > > hit it easily. As a workaround I've "disabled" vmstat.
>> >
>> > Well the report is since 18th Dec which is over month old. Should we
>> > revert 0eb77e988032 as a pre caution and make sure this is done properly
>> > in -mm tree. AFAIR none of the proposed fix worked without other
>> > fallouts?
> Seems that we are unable to get enough information to reproduce the issue?

As I've mentioned - this reproduces frequently. I'd be happy to add in debug
information into the kernel that might help you reproduce it, but as it seems
like a timing issue, I can't provide a simple reproducer.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-20 15:49           ` Sasha Levin
  0 siblings, 0 replies; 114+ messages in thread
From: Sasha Levin @ 2016-01-20 15:49 UTC (permalink / raw)
  To: Christoph Lameter, Michal Hocko; +Cc: LKML, linux-mm, Andrew Morton

On 01/20/2016 10:20 AM, Christoph Lameter wrote:
> On Wed, 20 Jan 2016, Michal Hocko wrote:
> 
>> > On Wed 20-01-16 09:56:26, Sasha Levin wrote:
>>> > > On 01/20/2016 09:37 AM, Michal Hocko wrote:
>>>> > > > I am just reading through this old discussion again because "vmstat:
>>>> > > > make vmstat_updater deferrable again and shut down on idle" which seems
>>>> > > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see
>>>> > > > any follow up fix merged to linus tree
>>> > >
>>> > > So this isn't an "old" discussion - the bug is very much there and I can
>>> > > hit it easily. As a workaround I've "disabled" vmstat.
>> >
>> > Well the report is since 18th Dec which is over month old. Should we
>> > revert 0eb77e988032 as a pre caution and make sure this is done properly
>> > in -mm tree. AFAIR none of the proposed fix worked without other
>> > fallouts?
> Seems that we are unable to get enough information to reproduce the issue?

As I've mentioned - this reproduces frequently. I'd be happy to add in debug
information into the kernel that might help you reproduce it, but as it seems
like a timing issue, I can't provide a simple reproducer.


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-20 15:49           ` Sasha Levin
@ 2016-01-20 15:55             ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-20 15:55 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm, Andrew Morton

On Wed, 20 Jan 2016, Sasha Levin wrote:

>
> As I've mentioned - this reproduces frequently. I'd be happy to add in debug
> information into the kernel that might help you reproduce it, but as it seems
> like a timing issue, I can't provide a simple reproducer.

This isnt really important I think. Lets remove it.


Subject: vmstat: Remove BUG_ON from vmstat_update

If we detect that there is nothing to do just set the flag and do not check
if it was already set before. Races really do not matter. If the flag is
set by any code then the shepherd will start dealing with the situation
and reenable the vmstat workers when necessary again.

Concurrent actions could be onlining and offlining of processors or be a
result of concurrency issues when updating the cpumask from multiple
processors.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
 		 * Defer the checking for differentials to the
 		 * shepherd thread on a different processor.
 		 */
-		int r;
-		/*
-		 * Shepherd work thread does not race since it never
-		 * changes the bit if its zero but the cpu
-		 * online / off line code may race if
-		 * worker threads are still allowed during
-		 * shutdown / startup.
-		 */
-		r = cpumask_test_and_set_cpu(smp_processor_id(),
-			cpu_stat_off);
-		VM_BUG_ON(r);
+		cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
 	}
 }

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-20 15:55             ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-20 15:55 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm, Andrew Morton

On Wed, 20 Jan 2016, Sasha Levin wrote:

>
> As I've mentioned - this reproduces frequently. I'd be happy to add in debug
> information into the kernel that might help you reproduce it, but as it seems
> like a timing issue, I can't provide a simple reproducer.

This isnt really important I think. Lets remove it.


Subject: vmstat: Remove BUG_ON from vmstat_update

If we detect that there is nothing to do just set the flag and do not check
if it was already set before. Races really do not matter. If the flag is
set by any code then the shepherd will start dealing with the situation
and reenable the vmstat workers when necessary again.

Concurrent actions could be onlining and offlining of processors or be a
result of concurrency issues when updating the cpumask from multiple
processors.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
 		 * Defer the checking for differentials to the
 		 * shepherd thread on a different processor.
 		 */
-		int r;
-		/*
-		 * Shepherd work thread does not race since it never
-		 * changes the bit if its zero but the cpu
-		 * online / off line code may race if
-		 * worker threads are still allowed during
-		 * shutdown / startup.
-		 */
-		r = cpumask_test_and_set_cpu(smp_processor_id(),
-			cpu_stat_off);
-		VM_BUG_ON(r);
+		cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
 	}
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-20 15:55             ` Christoph Lameter
@ 2016-01-20 21:28               ` Michal Hocko
  -1 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-20 21:28 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Wed 20-01-16 09:55:22, Christoph Lameter wrote:
[...]
> Subject: vmstat: Remove BUG_ON from vmstat_update
> 
> If we detect that there is nothing to do just set the flag and do not check
> if it was already set before. Races really do not matter. If the flag is
> set by any code then the shepherd will start dealing with the situation
> and reenable the vmstat workers when necessary again.
> 
> Concurrent actions could be onlining and offlining of processors or be a
> result of concurrency issues when updating the cpumask from multiple
> processors.

Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle) is merged the VM_BUG_ON is simply bogus because
vmstat_update might "race" with quiet_vmstat. The changelog should
reflect that. What about the following wording?

"
Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
particular cpu to be handled by vmstat_shepherd. This might trigger
a VM_BUG_ON in vmstat_update because the work item might have been
sleeping during the idle period and see the cpu_stat_off updated after
the wake up. The VM_BUG_ON is therefore misleading and no more
appropriate. Moreover it doesn't really suite any protection from real
bugs because vmstat_shepherd will simply reschedule the vmstat_work
anytime it sees a particular cpu set or vmstat_update would do the same
from the worker context directly. Even when the two would race the
result wouldn't be incorrect as the counters update is fully idempotent.

Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle")
CC: stable # 4.4+
"

> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
>  		 * Defer the checking for differentials to the
>  		 * shepherd thread on a different processor.
>  		 */
> -		int r;
> -		/*
> -		 * Shepherd work thread does not race since it never
> -		 * changes the bit if its zero but the cpu
> -		 * online / off line code may race if
> -		 * worker threads are still allowed during
> -		 * shutdown / startup.
> -		 */
> -		r = cpumask_test_and_set_cpu(smp_processor_id(),
> -			cpu_stat_off);
> -		VM_BUG_ON(r);
> +		cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
>  	}
>  }

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-20 21:28               ` Michal Hocko
  0 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-20 21:28 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Wed 20-01-16 09:55:22, Christoph Lameter wrote:
[...]
> Subject: vmstat: Remove BUG_ON from vmstat_update
> 
> If we detect that there is nothing to do just set the flag and do not check
> if it was already set before. Races really do not matter. If the flag is
> set by any code then the shepherd will start dealing with the situation
> and reenable the vmstat workers when necessary again.
> 
> Concurrent actions could be onlining and offlining of processors or be a
> result of concurrency issues when updating the cpumask from multiple
> processors.

Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle) is merged the VM_BUG_ON is simply bogus because
vmstat_update might "race" with quiet_vmstat. The changelog should
reflect that. What about the following wording?

"
Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
particular cpu to be handled by vmstat_shepherd. This might trigger
a VM_BUG_ON in vmstat_update because the work item might have been
sleeping during the idle period and see the cpu_stat_off updated after
the wake up. The VM_BUG_ON is therefore misleading and no more
appropriate. Moreover it doesn't really suite any protection from real
bugs because vmstat_shepherd will simply reschedule the vmstat_work
anytime it sees a particular cpu set or vmstat_update would do the same
from the worker context directly. Even when the two would race the
result wouldn't be incorrect as the counters update is fully idempotent.

Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle")
CC: stable # 4.4+
"

> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
>  		 * Defer the checking for differentials to the
>  		 * shepherd thread on a different processor.
>  		 */
> -		int r;
> -		/*
> -		 * Shepherd work thread does not race since it never
> -		 * changes the bit if its zero but the cpu
> -		 * online / off line code may race if
> -		 * worker threads are still allowed during
> -		 * shutdown / startup.
> -		 */
> -		r = cpumask_test_and_set_cpu(smp_processor_id(),
> -			cpu_stat_off);
> -		VM_BUG_ON(r);
> +		cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
>  	}
>  }

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-20 21:28               ` Michal Hocko
@ 2016-01-20 21:57                 ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-20 21:57 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Wed, 20 Jan 2016, Michal Hocko wrote:

> On Wed 20-01-16 09:55:22, Christoph Lameter wrote:
> [...]
> > Subject: vmstat: Remove BUG_ON from vmstat_update
> >
> > If we detect that there is nothing to do just set the flag and do not check
> > if it was already set before. Races really do not matter. If the flag is
> > set by any code then the shepherd will start dealing with the situation
> > and reenable the vmstat workers when necessary again.
> >
> > Concurrent actions could be onlining and offlining of processors or be a
> > result of concurrency issues when updating the cpumask from multiple
> > processors.
>
> Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle) is merged the VM_BUG_ON is simply bogus because
> vmstat_update might "race" with quiet_vmstat. The changelog should
> reflect that. What about the following wording?

How can it race if preemption is off?

> Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> particular cpu to be handled by vmstat_shepherd. This might trigger
> a VM_BUG_ON in vmstat_update because the work item might have been
> sleeping during the idle period and see the cpu_stat_off updated after
> the wake up. The VM_BUG_ON is therefore misleading and no more
> appropriate. Moreover it doesn't really suite any protection from real
> bugs because vmstat_shepherd will simply reschedule the vmstat_work
> anytime it sees a particular cpu set or vmstat_update would do the same
> from the worker context directly. Even when the two would race the
> result wouldn't be incorrect as the counters update is fully idempotent.


Hmmm... the vmstat_update can be interrupted while running and the cpu put
into idle mode? If vmstat_update is running then the cpu is not idle but
running code. If this is really going on then there is other stuff wrong
with the idling logic.

> Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle")
> CC: stable # 4.4+

?? There has not been an upstream release with this yet.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-20 21:57                 ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-20 21:57 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Wed, 20 Jan 2016, Michal Hocko wrote:

> On Wed 20-01-16 09:55:22, Christoph Lameter wrote:
> [...]
> > Subject: vmstat: Remove BUG_ON from vmstat_update
> >
> > If we detect that there is nothing to do just set the flag and do not check
> > if it was already set before. Races really do not matter. If the flag is
> > set by any code then the shepherd will start dealing with the situation
> > and reenable the vmstat workers when necessary again.
> >
> > Concurrent actions could be onlining and offlining of processors or be a
> > result of concurrency issues when updating the cpumask from multiple
> > processors.
>
> Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle) is merged the VM_BUG_ON is simply bogus because
> vmstat_update might "race" with quiet_vmstat. The changelog should
> reflect that. What about the following wording?

How can it race if preemption is off?

> Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> particular cpu to be handled by vmstat_shepherd. This might trigger
> a VM_BUG_ON in vmstat_update because the work item might have been
> sleeping during the idle period and see the cpu_stat_off updated after
> the wake up. The VM_BUG_ON is therefore misleading and no more
> appropriate. Moreover it doesn't really suite any protection from real
> bugs because vmstat_shepherd will simply reschedule the vmstat_work
> anytime it sees a particular cpu set or vmstat_update would do the same
> from the worker context directly. Even when the two would race the
> result wouldn't be incorrect as the counters update is fully idempotent.


Hmmm... the vmstat_update can be interrupted while running and the cpu put
into idle mode? If vmstat_update is running then the cpu is not idle but
running code. If this is really going on then there is other stuff wrong
with the idling logic.

> Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle")
> CC: stable # 4.4+

?? There has not been an upstream release with this yet.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-20 21:57                 ` Christoph Lameter
@ 2016-01-21  8:24                   ` Michal Hocko
  -1 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-21  8:24 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Wed 20-01-16 15:57:43, Christoph Lameter wrote:
> On Wed, 20 Jan 2016, Michal Hocko wrote:
> 
> > On Wed 20-01-16 09:55:22, Christoph Lameter wrote:
> > [...]
> > > Subject: vmstat: Remove BUG_ON from vmstat_update
> > >
> > > If we detect that there is nothing to do just set the flag and do not check
> > > if it was already set before. Races really do not matter. If the flag is
> > > set by any code then the shepherd will start dealing with the situation
> > > and reenable the vmstat workers when necessary again.
> > >
> > > Concurrent actions could be onlining and offlining of processors or be a
> > > result of concurrency issues when updating the cpumask from multiple
> > > processors.
> >
> > Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and
> > shut down on idle) is merged the VM_BUG_ON is simply bogus because
> > vmstat_update might "race" with quiet_vmstat. The changelog should
> > reflect that. What about the following wording?
> 
> How can it race if preemption is off?

See below.

> > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> > particular cpu to be handled by vmstat_shepherd. This might trigger
> > a VM_BUG_ON in vmstat_update because the work item might have been
> > sleeping during the idle period and see the cpu_stat_off updated after
> > the wake up. The VM_BUG_ON is therefore misleading and no more
> > appropriate. Moreover it doesn't really suite any protection from real
> > bugs because vmstat_shepherd will simply reschedule the vmstat_work
> > anytime it sees a particular cpu set or vmstat_update would do the same
> > from the worker context directly. Even when the two would race the
> > result wouldn't be incorrect as the counters update is fully idempotent.
> 
> 
> Hmmm... the vmstat_update can be interrupted while running and the cpu put
> into idle mode? If vmstat_update is running then the cpu is not idle but
> running code. If this is really going on then there is other stuff wrong
> with the idling logic.

The vmstat update might be still waiting for its timer, idle mode started
and kick vmstat_update which might cpumask_test_and_set_cpu. Once the
idle terminates and the originally schedule vmstate_update executes it
sees the bit set and BUG_ON.

> > Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> > shut down on idle")
> > CC: stable # 4.4+
> 
> ?? There has not been an upstream release with this yet.

Ohh, I thought it made it into 4.4 but you are right it is post 4.4 so
no CC: stable required.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-21  8:24                   ` Michal Hocko
  0 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-21  8:24 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Wed 20-01-16 15:57:43, Christoph Lameter wrote:
> On Wed, 20 Jan 2016, Michal Hocko wrote:
> 
> > On Wed 20-01-16 09:55:22, Christoph Lameter wrote:
> > [...]
> > > Subject: vmstat: Remove BUG_ON from vmstat_update
> > >
> > > If we detect that there is nothing to do just set the flag and do not check
> > > if it was already set before. Races really do not matter. If the flag is
> > > set by any code then the shepherd will start dealing with the situation
> > > and reenable the vmstat workers when necessary again.
> > >
> > > Concurrent actions could be onlining and offlining of processors or be a
> > > result of concurrency issues when updating the cpumask from multiple
> > > processors.
> >
> > Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and
> > shut down on idle) is merged the VM_BUG_ON is simply bogus because
> > vmstat_update might "race" with quiet_vmstat. The changelog should
> > reflect that. What about the following wording?
> 
> How can it race if preemption is off?

See below.

> > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> > particular cpu to be handled by vmstat_shepherd. This might trigger
> > a VM_BUG_ON in vmstat_update because the work item might have been
> > sleeping during the idle period and see the cpu_stat_off updated after
> > the wake up. The VM_BUG_ON is therefore misleading and no more
> > appropriate. Moreover it doesn't really suite any protection from real
> > bugs because vmstat_shepherd will simply reschedule the vmstat_work
> > anytime it sees a particular cpu set or vmstat_update would do the same
> > from the worker context directly. Even when the two would race the
> > result wouldn't be incorrect as the counters update is fully idempotent.
> 
> 
> Hmmm... the vmstat_update can be interrupted while running and the cpu put
> into idle mode? If vmstat_update is running then the cpu is not idle but
> running code. If this is really going on then there is other stuff wrong
> with the idling logic.

The vmstat update might be still waiting for its timer, idle mode started
and kick vmstat_update which might cpumask_test_and_set_cpu. Once the
idle terminates and the originally schedule vmstate_update executes it
sees the bit set and BUG_ON.

> > Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> > shut down on idle")
> > CC: stable # 4.4+
> 
> ?? There has not been an upstream release with this yet.

Ohh, I thought it made it into 4.4 but you are right it is post 4.4 so
no CC: stable required.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-21  8:24                   ` Michal Hocko
@ 2016-01-21 15:45                     ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-21 15:45 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Thu, 21 Jan 2016, Michal Hocko wrote:

> > > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> > > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> > > particular cpu to be handled by vmstat_shepherd. This might trigger
> > > a VM_BUG_ON in vmstat_update because the work item might have been
> > > sleeping during the idle period and see the cpu_stat_off updated after
> > > the wake up. The VM_BUG_ON is therefore misleading and no more
> > > appropriate. Moreover it doesn't really suite any protection from real
> > > bugs because vmstat_shepherd will simply reschedule the vmstat_work
> > > anytime it sees a particular cpu set or vmstat_update would do the same
> > > from the worker context directly. Even when the two would race the
> > > result wouldn't be incorrect as the counters update is fully idempotent.
> >
> >
> > Hmmm... the vmstat_update can be interrupted while running and the cpu put
> > into idle mode? If vmstat_update is running then the cpu is not idle but
> > running code. If this is really going on then there is other stuff wrong
> > with the idling logic.
>
> The vmstat update might be still waiting for its timer, idle mode started
> and kick vmstat_update which might cpumask_test_and_set_cpu. Once the
> idle terminates and the originally schedule vmstate_update executes it
> sees the bit set and BUG_ON.

Ok so we are going into idle mode and the vmstat_update timer is pending.
Then the timer will not fire since going idle switches preemption off.
quiet_vmstat will run without the chance of running vmstat_update

We could be going idle and not have disabled preemption yet. Then
vmstat_update will run. On return to the idling operation preemption will
be disabled and quiet_vmstat() will be run.

I do not see how these two things could race.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-21 15:45                     ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-21 15:45 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Thu, 21 Jan 2016, Michal Hocko wrote:

> > > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> > > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> > > particular cpu to be handled by vmstat_shepherd. This might trigger
> > > a VM_BUG_ON in vmstat_update because the work item might have been
> > > sleeping during the idle period and see the cpu_stat_off updated after
> > > the wake up. The VM_BUG_ON is therefore misleading and no more
> > > appropriate. Moreover it doesn't really suite any protection from real
> > > bugs because vmstat_shepherd will simply reschedule the vmstat_work
> > > anytime it sees a particular cpu set or vmstat_update would do the same
> > > from the worker context directly. Even when the two would race the
> > > result wouldn't be incorrect as the counters update is fully idempotent.
> >
> >
> > Hmmm... the vmstat_update can be interrupted while running and the cpu put
> > into idle mode? If vmstat_update is running then the cpu is not idle but
> > running code. If this is really going on then there is other stuff wrong
> > with the idling logic.
>
> The vmstat update might be still waiting for its timer, idle mode started
> and kick vmstat_update which might cpumask_test_and_set_cpu. Once the
> idle terminates and the originally schedule vmstate_update executes it
> sees the bit set and BUG_ON.

Ok so we are going into idle mode and the vmstat_update timer is pending.
Then the timer will not fire since going idle switches preemption off.
quiet_vmstat will run without the chance of running vmstat_update

We could be going idle and not have disabled preemption yet. Then
vmstat_update will run. On return to the idling operation preemption will
be disabled and quiet_vmstat() will be run.

I do not see how these two things could race.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-21 15:45                     ` Christoph Lameter
@ 2016-01-21 16:51                       ` Michal Hocko
  -1 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-21 16:51 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Thu 21-01-16 09:45:12, Christoph Lameter wrote:
> On Thu, 21 Jan 2016, Michal Hocko wrote:
[...]
> > The vmstat update might be still waiting for its timer, idle mode started
> > and kick vmstat_update which might cpumask_test_and_set_cpu. Once the
> > idle terminates and the originally schedule vmstate_update executes it
> > sees the bit set and BUG_ON.
> 
> Ok so we are going into idle mode and the vmstat_update timer is pending.
> Then the timer will not fire since going idle switches preemption off.
> quiet_vmstat will run without the chance of running vmstat_update
> 
> We could be going idle and not have disabled preemption yet. Then
> vmstat_update will run. On return to the idling operation preemption will
> be disabled and quiet_vmstat() will be run.
> 
> I do not see how these two things could race.

It goes like this:
CPU0:						CPU1
vmstat_update
  cpumask_test_and_set_cpu (0->1)
[...]
						vmstat_shepherd
<enter idle>					  cpumask_test_and_clear_cpu(CPU0) (1->0)
quiet_vmstat
  cpumask_test_and_set_cpu (0->1)
  						  queue_delayed_work_on(CPU0)
refresh_cpu_vm_stats()
[...]
vmstat_update
  nothing_to_do
  cpumask_test_and_set_cpu (1->1)
  VM_BUG_ON

Or am I missing something?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-21 16:51                       ` Michal Hocko
  0 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-21 16:51 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Thu 21-01-16 09:45:12, Christoph Lameter wrote:
> On Thu, 21 Jan 2016, Michal Hocko wrote:
[...]
> > The vmstat update might be still waiting for its timer, idle mode started
> > and kick vmstat_update which might cpumask_test_and_set_cpu. Once the
> > idle terminates and the originally schedule vmstate_update executes it
> > sees the bit set and BUG_ON.
> 
> Ok so we are going into idle mode and the vmstat_update timer is pending.
> Then the timer will not fire since going idle switches preemption off.
> quiet_vmstat will run without the chance of running vmstat_update
> 
> We could be going idle and not have disabled preemption yet. Then
> vmstat_update will run. On return to the idling operation preemption will
> be disabled and quiet_vmstat() will be run.
> 
> I do not see how these two things could race.

It goes like this:
CPU0:						CPU1
vmstat_update
  cpumask_test_and_set_cpu (0->1)
[...]
						vmstat_shepherd
<enter idle>					  cpumask_test_and_clear_cpu(CPU0) (1->0)
quiet_vmstat
  cpumask_test_and_set_cpu (0->1)
  						  queue_delayed_work_on(CPU0)
refresh_cpu_vm_stats()
[...]
vmstat_update
  nothing_to_do
  cpumask_test_and_set_cpu (1->1)
  VM_BUG_ON

Or am I missing something?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-21 16:51                       ` Michal Hocko
@ 2016-01-21 17:38                         ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-21 17:38 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Thu, 21 Jan 2016, Michal Hocko wrote:

> It goes like this:
> CPU0:						CPU1
> vmstat_update
>   cpumask_test_and_set_cpu (0->1)
> [...]
> 						vmstat_shepherd
> <enter idle>					  cpumask_test_and_clear_cpu(CPU0) (1->0)
> quiet_vmstat
>   cpumask_test_and_set_cpu (0->1)
>   						  queue_delayed_work_on(CPU0)
> refresh_cpu_vm_stats()
> [...]
> vmstat_update
>   nothing_to_do
>   cpumask_test_and_set_cpu (1->1)
>   VM_BUG_ON
>
> Or am I missing something?

Ok then the following should fix it:



Subject: vmstat: Queue work before clearing cpu_stat_off

There is a race between vmstat_shepherd and quiet_vmstat() because
the responsibility for checking for counter updates changes depending
on the state of teh bit in cpu_stat_off. So queue the work before
changing state of the bit in vmstat_shepherd. That way quiet_vmstat
is guaranteed to remove the work request when clearing the bit and the
bug in vmstat_update wont trigger anymore.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1480,12 +1480,14 @@ static void vmstat_shepherd(struct work_
 	get_online_cpus();
 	/* Check processors whose vmstat worker threads have been disabled */
 	for_each_cpu(cpu, cpu_stat_off)
-		if (need_update(cpu) &&
-			cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
+		if (need_update(cpu)) {

 			queue_delayed_work_on(cpu, vmstat_wq,
 				&per_cpu(vmstat_work, cpu), 0);

+			cpumask_clear_cpu(smp_processor_id(), cpu_stat_off);
+		}
+
 	put_online_cpus();

 	schedule_delayed_work(&shepherd,

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-21 17:38                         ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-21 17:38 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Thu, 21 Jan 2016, Michal Hocko wrote:

> It goes like this:
> CPU0:						CPU1
> vmstat_update
>   cpumask_test_and_set_cpu (0->1)
> [...]
> 						vmstat_shepherd
> <enter idle>					  cpumask_test_and_clear_cpu(CPU0) (1->0)
> quiet_vmstat
>   cpumask_test_and_set_cpu (0->1)
>   						  queue_delayed_work_on(CPU0)
> refresh_cpu_vm_stats()
> [...]
> vmstat_update
>   nothing_to_do
>   cpumask_test_and_set_cpu (1->1)
>   VM_BUG_ON
>
> Or am I missing something?

Ok then the following should fix it:



Subject: vmstat: Queue work before clearing cpu_stat_off

There is a race between vmstat_shepherd and quiet_vmstat() because
the responsibility for checking for counter updates changes depending
on the state of teh bit in cpu_stat_off. So queue the work before
changing state of the bit in vmstat_shepherd. That way quiet_vmstat
is guaranteed to remove the work request when clearing the bit and the
bug in vmstat_update wont trigger anymore.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1480,12 +1480,14 @@ static void vmstat_shepherd(struct work_
 	get_online_cpus();
 	/* Check processors whose vmstat worker threads have been disabled */
 	for_each_cpu(cpu, cpu_stat_off)
-		if (need_update(cpu) &&
-			cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
+		if (need_update(cpu)) {

 			queue_delayed_work_on(cpu, vmstat_wq,
 				&per_cpu(vmstat_work, cpu), 0);

+			cpumask_clear_cpu(smp_processor_id(), cpu_stat_off);
+		}
+
 	put_online_cpus();

 	schedule_delayed_work(&shepherd,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-21 17:38                         ` Christoph Lameter
@ 2016-01-22 11:00                           ` Shiraz Hashim
  -1 siblings, 0 replies; 114+ messages in thread
From: Shiraz Hashim @ 2016-01-22 11:00 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Michal Hocko, Sasha Levin, LKML, linux-mm, Andrew Morton

On Thu, Jan 21, 2016 at 11:08 PM, Christoph Lameter <cl@linux.com> wrote:
> Subject: vmstat: Queue work before clearing cpu_stat_off
>
> There is a race between vmstat_shepherd and quiet_vmstat() because
> the responsibility for checking for counter updates changes depending
> on the state of teh bit in cpu_stat_off. So queue the work before
> changing state of the bit in vmstat_shepherd. That way quiet_vmstat
> is guaranteed to remove the work request when clearing the bit and the
> bug in vmstat_update wont trigger anymore.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1480,12 +1480,14 @@ static void vmstat_shepherd(struct work_
>         get_online_cpus();
>         /* Check processors whose vmstat worker threads have been disabled */
>         for_each_cpu(cpu, cpu_stat_off)
> -               if (need_update(cpu) &&
> -                       cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
> +               if (need_update(cpu)) {
>
>                         queue_delayed_work_on(cpu, vmstat_wq,
>                                 &per_cpu(vmstat_work, cpu), 0);
>
> +                       cpumask_clear_cpu(smp_processor_id(), cpu_stat_off);
> +               }
> +
>         put_online_cpus();
>
>         schedule_delayed_work(&shepherd,


This can alternatively lead to following where vmstat may not be
scheduled for cpu  when it is back from idle.

CPU0:                                            CPU1:
                                                       vmstat_shepherd
<enter idle>                                    queue_delayed_work_on(CPU0)
quiet_vmstat
  cancel_delayed_work
  cpumask_test_and_set_cpu (0->1)

cpumask_clear_cpu(CPU0) (1->0)

-- 
regards
Shiraz Hashim

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-22 11:00                           ` Shiraz Hashim
  0 siblings, 0 replies; 114+ messages in thread
From: Shiraz Hashim @ 2016-01-22 11:00 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Michal Hocko, Sasha Levin, LKML, linux-mm, Andrew Morton

On Thu, Jan 21, 2016 at 11:08 PM, Christoph Lameter <cl@linux.com> wrote:
> Subject: vmstat: Queue work before clearing cpu_stat_off
>
> There is a race between vmstat_shepherd and quiet_vmstat() because
> the responsibility for checking for counter updates changes depending
> on the state of teh bit in cpu_stat_off. So queue the work before
> changing state of the bit in vmstat_shepherd. That way quiet_vmstat
> is guaranteed to remove the work request when clearing the bit and the
> bug in vmstat_update wont trigger anymore.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1480,12 +1480,14 @@ static void vmstat_shepherd(struct work_
>         get_online_cpus();
>         /* Check processors whose vmstat worker threads have been disabled */
>         for_each_cpu(cpu, cpu_stat_off)
> -               if (need_update(cpu) &&
> -                       cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
> +               if (need_update(cpu)) {
>
>                         queue_delayed_work_on(cpu, vmstat_wq,
>                                 &per_cpu(vmstat_work, cpu), 0);
>
> +                       cpumask_clear_cpu(smp_processor_id(), cpu_stat_off);
> +               }
> +
>         put_online_cpus();
>
>         schedule_delayed_work(&shepherd,


This can alternatively lead to following where vmstat may not be
scheduled for cpu  when it is back from idle.

CPU0:                                            CPU1:
                                                       vmstat_shepherd
<enter idle>                                    queue_delayed_work_on(CPU0)
quiet_vmstat
  cancel_delayed_work
  cpumask_test_and_set_cpu (0->1)

cpumask_clear_cpu(CPU0) (1->0)

-- 
regards
Shiraz Hashim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-21 17:38                         ` Christoph Lameter
@ 2016-01-22 14:04                           ` Michal Hocko
  -1 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-22 14:04 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Thu 21-01-16 11:38:46, Christoph Lameter wrote:
> On Thu, 21 Jan 2016, Michal Hocko wrote:
> 
> > It goes like this:
> > CPU0:						CPU1
> > vmstat_update
> >   cpumask_test_and_set_cpu (0->1)
> > [...]
> > 						vmstat_shepherd
> > <enter idle>					  cpumask_test_and_clear_cpu(CPU0) (1->0)
> > quiet_vmstat
> >   cpumask_test_and_set_cpu (0->1)
> >   						  queue_delayed_work_on(CPU0)
> > refresh_cpu_vm_stats()
> > [...]
> > vmstat_update
> >   nothing_to_do
> >   cpumask_test_and_set_cpu (1->1)
> >   VM_BUG_ON
> >
> > Or am I missing something?
> 
> Ok then the following should fix it:

Wouldn't it be much more easier and simply get rid of the VM_BUG_ON?
What is the point of keeping it in the first place. The code can
perfectly cope with the race.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-22 14:04                           ` Michal Hocko
  0 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-22 14:04 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Thu 21-01-16 11:38:46, Christoph Lameter wrote:
> On Thu, 21 Jan 2016, Michal Hocko wrote:
> 
> > It goes like this:
> > CPU0:						CPU1
> > vmstat_update
> >   cpumask_test_and_set_cpu (0->1)
> > [...]
> > 						vmstat_shepherd
> > <enter idle>					  cpumask_test_and_clear_cpu(CPU0) (1->0)
> > quiet_vmstat
> >   cpumask_test_and_set_cpu (0->1)
> >   						  queue_delayed_work_on(CPU0)
> > refresh_cpu_vm_stats()
> > [...]
> > vmstat_update
> >   nothing_to_do
> >   cpumask_test_and_set_cpu (1->1)
> >   VM_BUG_ON
> >
> > Or am I missing something?
> 
> Ok then the following should fix it:

Wouldn't it be much more easier and simply get rid of the VM_BUG_ON?
What is the point of keeping it in the first place. The code can
perfectly cope with the race.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-22 14:04                           ` Michal Hocko
@ 2016-01-22 16:07                             ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-22 16:07 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Fri, 22 Jan 2016, Michal Hocko wrote:

> Wouldn't it be much more easier and simply get rid of the VM_BUG_ON?
> What is the point of keeping it in the first place. The code can
> perfectly cope with the race.

Ok then lets do that.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-22 16:07                             ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-22 16:07 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Fri, 22 Jan 2016, Michal Hocko wrote:

> Wouldn't it be much more easier and simply get rid of the VM_BUG_ON?
> What is the point of keeping it in the first place. The code can
> perfectly cope with the race.

Ok then lets do that.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-22 16:07                             ` Christoph Lameter
@ 2016-01-22 16:12                               ` Michal Hocko
  -1 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-22 16:12 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Fri 22-01-16 10:07:01, Christoph Lameter wrote:
> On Fri, 22 Jan 2016, Michal Hocko wrote:
> 
> > Wouldn't it be much more easier and simply get rid of the VM_BUG_ON?
> > What is the point of keeping it in the first place. The code can
> > perfectly cope with the race.
> 
> Ok then lets do that.

Could you repost the patch with the updated description?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-22 16:12                               ` Michal Hocko
  0 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-22 16:12 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Fri 22-01-16 10:07:01, Christoph Lameter wrote:
> On Fri, 22 Jan 2016, Michal Hocko wrote:
> 
> > Wouldn't it be much more easier and simply get rid of the VM_BUG_ON?
> > What is the point of keeping it in the first place. The code can
> > perfectly cope with the race.
> 
> Ok then lets do that.

Could you repost the patch with the updated description?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-22 16:12                               ` Michal Hocko
@ 2016-01-22 16:46                                 ` Christoph Lameter
  -1 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-22 16:46 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Fri, 22 Jan 2016, Michal Hocko wrote:

> Could you repost the patch with the updated description?

Subject: vmstat: Remove BUG_ON from vmstat_update

If we detect that there is nothing to do just set the flag and do not check
if it was already set before. Races really do not matter. If the flag is
set by any code then the shepherd will start dealing with the situation
and reenable the vmstat workers when necessary again.

Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
particular cpu to be handled by vmstat_shepherd. This might trigger
a VM_BUG_ON in vmstat_update because the work item might have been
sleeping during the idle period and see the cpu_stat_off updated after
the wake up. The VM_BUG_ON is therefore misleading and no more
appropriate. Moreover it doesn't really suite any protection from real
bugs because vmstat_shepherd will simply reschedule the vmstat_work
anytime it sees a particular cpu set or vmstat_update would do the same
from the worker context directly. Even when the two would race the
result wouldn't be incorrect as the counters update is fully idempotent.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
 		 * Defer the checking for differentials to the
 		 * shepherd thread on a different processor.
 		 */
-		int r;
-		/*
-		 * Shepherd work thread does not race since it never
-		 * changes the bit if its zero but the cpu
-		 * online / off line code may race if
-		 * worker threads are still allowed during
-		 * shutdown / startup.
-		 */
-		r = cpumask_test_and_set_cpu(smp_processor_id(),
-			cpu_stat_off);
-		VM_BUG_ON(r);
+		cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
 	}
 }

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-22 16:46                                 ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-22 16:46 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Fri, 22 Jan 2016, Michal Hocko wrote:

> Could you repost the patch with the updated description?

Subject: vmstat: Remove BUG_ON from vmstat_update

If we detect that there is nothing to do just set the flag and do not check
if it was already set before. Races really do not matter. If the flag is
set by any code then the shepherd will start dealing with the situation
and reenable the vmstat workers when necessary again.

Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
particular cpu to be handled by vmstat_shepherd. This might trigger
a VM_BUG_ON in vmstat_update because the work item might have been
sleeping during the idle period and see the cpu_stat_off updated after
the wake up. The VM_BUG_ON is therefore misleading and no more
appropriate. Moreover it doesn't really suite any protection from real
bugs because vmstat_shepherd will simply reschedule the vmstat_work
anytime it sees a particular cpu set or vmstat_update would do the same
from the worker context directly. Even when the two would race the
result wouldn't be incorrect as the counters update is fully idempotent.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/vmstat.c
===================================================================
--- linux.orig/mm/vmstat.c
+++ linux/mm/vmstat.c
@@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
 		 * Defer the checking for differentials to the
 		 * shepherd thread on a different processor.
 		 */
-		int r;
-		/*
-		 * Shepherd work thread does not race since it never
-		 * changes the bit if its zero but the cpu
-		 * online / off line code may race if
-		 * worker threads are still allowed during
-		 * shutdown / startup.
-		 */
-		r = cpumask_test_and_set_cpu(smp_processor_id(),
-			cpu_stat_off);
-		VM_BUG_ON(r);
+		cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
 	}
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-22 16:46                                 ` Christoph Lameter
@ 2016-01-22 17:12                                   ` Michal Hocko
  -1 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-22 17:12 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Fri 22-01-16 10:46:14, Christoph Lameter wrote:
> On Fri, 22 Jan 2016, Michal Hocko wrote:
> 
> > Could you repost the patch with the updated description?
> 
> Subject: vmstat: Remove BUG_ON from vmstat_update
> 
> If we detect that there is nothing to do just set the flag and do not check
> if it was already set before. Races really do not matter. If the flag is
> set by any code then the shepherd will start dealing with the situation
> and reenable the vmstat workers when necessary again.
> 
> Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> particular cpu to be handled by vmstat_shepherd. This might trigger
> a VM_BUG_ON in vmstat_update because the work item might have been
> sleeping during the idle period and see the cpu_stat_off updated after
> the wake up. The VM_BUG_ON is therefore misleading and no more
> appropriate. Moreover it doesn't really suite any protection from real
> bugs because vmstat_shepherd will simply reschedule the vmstat_work
> anytime it sees a particular cpu set or vmstat_update would do the same
> from the worker context directly. Even when the two would race the
> result wouldn't be incorrect as the counters update is fully idempotent.
>

Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle"
Reported-by: Sasha Levin <sasha.levin@oracle.com>

Would be appropriate IMO

> Signed-off-by: Christoph Lameter <cl@linux.com>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
>  		 * Defer the checking for differentials to the
>  		 * shepherd thread on a different processor.
>  		 */
> -		int r;
> -		/*
> -		 * Shepherd work thread does not race since it never
> -		 * changes the bit if its zero but the cpu
> -		 * online / off line code may race if
> -		 * worker threads are still allowed during
> -		 * shutdown / startup.
> -		 */
> -		r = cpumask_test_and_set_cpu(smp_processor_id(),
> -			cpu_stat_off);
> -		VM_BUG_ON(r);
> +		cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
>  	}
>  }

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-22 17:12                                   ` Michal Hocko
  0 siblings, 0 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-22 17:12 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton

On Fri 22-01-16 10:46:14, Christoph Lameter wrote:
> On Fri, 22 Jan 2016, Michal Hocko wrote:
> 
> > Could you repost the patch with the updated description?
> 
> Subject: vmstat: Remove BUG_ON from vmstat_update
> 
> If we detect that there is nothing to do just set the flag and do not check
> if it was already set before. Races really do not matter. If the flag is
> set by any code then the shepherd will start dealing with the situation
> and reenable the vmstat workers when necessary again.
> 
> Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
> shut down on idle") quiet_vmstat might update cpu_stat_off and mark a
> particular cpu to be handled by vmstat_shepherd. This might trigger
> a VM_BUG_ON in vmstat_update because the work item might have been
> sleeping during the idle period and see the cpu_stat_off updated after
> the wake up. The VM_BUG_ON is therefore misleading and no more
> appropriate. Moreover it doesn't really suite any protection from real
> bugs because vmstat_shepherd will simply reschedule the vmstat_work
> anytime it sees a particular cpu set or vmstat_update would do the same
> from the worker context directly. Even when the two would race the
> result wouldn't be incorrect as the counters update is fully idempotent.
>

Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle"
Reported-by: Sasha Levin <sasha.levin@oracle.com>

Would be appropriate IMO

> Signed-off-by: Christoph Lameter <cl@linux.com>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> 
> Index: linux/mm/vmstat.c
> ===================================================================
> --- linux.orig/mm/vmstat.c
> +++ linux/mm/vmstat.c
> @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st
>  		 * Defer the checking for differentials to the
>  		 * shepherd thread on a different processor.
>  		 */
> -		int r;
> -		/*
> -		 * Shepherd work thread does not race since it never
> -		 * changes the bit if its zero but the cpu
> -		 * online / off line code may race if
> -		 * worker threads are still allowed during
> -		 * shutdown / startup.
> -		 */
> -		r = cpumask_test_and_set_cpu(smp_processor_id(),
> -			cpu_stat_off);
> -		VM_BUG_ON(r);
> +		cpumask_set_cpu(smp_processor_id(), cpu_stat_off);
>  	}
>  }

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-22 16:46                                 ` Christoph Lameter
  (?)
  (?)
@ 2016-01-23 16:21                                 ` Mike Galbraith
  2016-01-24  0:33                                   ` Christoph Lameter
                                                     ` (2 more replies)
  -1 siblings, 3 replies; 114+ messages in thread
From: Mike Galbraith @ 2016-01-23 16:21 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Peter Zijlstra, LKML

Hi Christoph,

While you're fixing that commit up, can you perhaps find a better home
for quiet_vmstat()?  It not only munches cycles when switching cross
-core mightily, for -rt it injects a sleeping lock into the idle task.

    12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
     4.75%  [kernel]       [k] __schedule                  
     4.70%  [kernel]       [k] mutex_unlock                
     3.14%  [kernel]       [k] __switch_to                 

	-Mike

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-23 16:21                                 ` fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) Mike Galbraith
@ 2016-01-24  0:33                                   ` Christoph Lameter
  2016-01-24  2:46                                     ` Mike Galbraith
  2016-01-25 17:42                                   ` Michal Hocko
  2016-01-27 16:48                                   ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) " Michal Hocko
  2 siblings, 1 reply; 114+ messages in thread
From: Christoph Lameter @ 2016-01-24  0:33 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Peter Zijlstra, LKML

On Sat, 23 Jan 2016, Mike Galbraith wrote:

> While you're fixing that commit up, can you perhaps find a better home
> for quiet_vmstat()?  It not only munches cycles when switching cross
> -core mightily, for -rt it injects a sleeping lock into the idle task.

Not sure what you are talking about. No sleeping locks are used in
quiet_vmstat() nor does it switch across cores. It would be broken if it
would do so.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-24  0:33                                   ` Christoph Lameter
@ 2016-01-24  2:46                                     ` Mike Galbraith
  2016-01-24  3:46                                       ` Christoph Lameter
  0 siblings, 1 reply; 114+ messages in thread
From: Mike Galbraith @ 2016-01-24  2:46 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Peter Zijlstra, LKML

On Sat, 2016-01-23 at 18:33 -0600, Christoph Lameter wrote:
> On Sat, 23 Jan 2016, Mike Galbraith wrote:
> 
> > While you're fixing that commit up, can you perhaps find a better home
> > for quiet_vmstat()?  It not only munches cycles when switching cross
> > -core mightily, for -rt it injects a sleeping lock into the idle task.
> 
> Not sure what you are talking about. No sleeping locks are used in
> quiet_vmstat() nor does it switch across cores. It would be broken if it
> would do so.

By switching cross-core, I'm referring to scheduling of communicating
tasks.

The perf top snippet...

    12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
     4.75%  [kernel]       [k] __schedule                  
     4.70%  [kernel]       [k] mutex_unlock                
     3.14%  [kernel]       [k] __switch_to

... was pipe-test, an unrealistic microbench, but the same will happen
at lower frequency in the real world.

Here's the sleeping lock for -rt:

[    2.279582] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7
[    2.280444] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[    2.281316]  ffff88040b00d640 ffff88040b01fe10 ffffffff812d20e2 0000000000000000
[    2.282202]  ffff88040b01fe30 ffffffff81081095 ffff88041ec4cee0 ffff88041ec501e0
[    2.283073]  ffff88040b01fe48 ffffffff815ff910 ffff88041ec4cee0 ffff88040b01fe88
[    2.283941] Call Trace:
[    2.284797]  [<ffffffff812d20e2>] dump_stack+0x49/0x67
[    2.285658]  [<ffffffff81081095>] ___might_sleep+0xf5/0x180
[    2.286521]  [<ffffffff815ff910>] rt_spin_lock+0x20/0x50
[    2.287382]  [<ffffffff81075919>] try_to_grab_pending+0x69/0x240
[    2.288239]  [<ffffffff81075b16>] cancel_delayed_work+0x26/0xe0
[    2.289094]  [<ffffffff8115ec05>] quiet_vmstat+0x75/0xa0
[    2.289949]  [<ffffffff8109ab38>] cpu_idle_loop+0x38/0x3e0
[    2.290800]  [<ffffffff8109aef3>] cpu_startup_entry+0x13/0x20
[    2.291647]  [<ffffffff81036164>] start_secondary+0x114/0x140

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-24  2:46                                     ` Mike Galbraith
@ 2016-01-24  3:46                                       ` Christoph Lameter
  2016-01-24  5:36                                         ` Mike Galbraith
  0 siblings, 1 reply; 114+ messages in thread
From: Christoph Lameter @ 2016-01-24  3:46 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Peter Zijlstra, LKML

On Sun, 24 Jan 2016, Mike Galbraith wrote:

> By switching cross-core, I'm referring to scheduling of communicating
> tasks.

??? Its cancelling a work request. That is a "communicating task"?

> Here's the sleeping lock for -rt:
>
> [    2.279582] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7
> [    2.280444] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> [    2.281316]  ffff88040b00d640 ffff88040b01fe10 ffffffff812d20e2 0000000000000000
> [    2.282202]  ffff88040b01fe30 ffffffff81081095 ffff88041ec4cee0 ffff88041ec501e0
> [    2.283073]  ffff88040b01fe48 ffffffff815ff910 ffff88041ec4cee0 ffff88040b01fe88
> [    2.283941] Call Trace:
> [    2.284797]  [<ffffffff812d20e2>] dump_stack+0x49/0x67
> [    2.285658]  [<ffffffff81081095>] ___might_sleep+0xf5/0x180
> [    2.286521]  [<ffffffff815ff910>] rt_spin_lock+0x20/0x50
> [    2.287382]  [<ffffffff81075919>] try_to_grab_pending+0x69/0x240
> [    2.288239]  [<ffffffff81075b16>] cancel_delayed_work+0x26/0xe0

OMG cancelling a work request causes a sleeping lock to be taken?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-24  3:46                                       ` Christoph Lameter
@ 2016-01-24  5:36                                         ` Mike Galbraith
  0 siblings, 0 replies; 114+ messages in thread
From: Mike Galbraith @ 2016-01-24  5:36 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Peter Zijlstra, LKML

On Sat, 2016-01-23 at 21:46 -0600, Christoph Lameter wrote:
> On Sun, 24 Jan 2016, Mike Galbraith wrote:
> 
> > By switching cross-core, I'm referring to scheduling of communicating
> > tasks.
> 
> ??? Its cancelling a work request. That is a "communicating task"?

No no no, pipe-test is two tasks playing ping-pong via a pipe.  They
are about as synchronous as it gets, so each task goes idle at high
frequency.  Idle is fastpath.

> > Here's the sleeping lock for -rt:
> > 
> > [    2.279582] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7
> > [    2.280444] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> > [    2.281316]  ffff88040b00d640 ffff88040b01fe10 ffffffff812d20e2 0000000000000000
> > [    2.282202]  ffff88040b01fe30 ffffffff81081095 ffff88041ec4cee0 ffff88041ec501e0
> > [    2.283073]  ffff88040b01fe48 ffffffff815ff910 ffff88041ec4cee0 ffff88040b01fe88
> > [    2.283941] Call Trace:
> > [    2.284797]  [] dump_stack+0x49/0x67
> > [    2.285658]  [] ___might_sleep+0xf5/0x180
> > [    2.286521]  [] rt_spin_lock+0x20/0x50
> > [    2.287382]  [] try_to_grab_pending+0x69/0x240
> > [    2.288239]  [] cancel_delayed_work+0x26/0xe0
> 
> OMG cancelling a work request causes a sleeping lock to be taken?

Yup.  In -rt, spinlocks that are not raw are transformed into rtmutex.

	-Mike

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
  2016-01-22 16:46                                 ` Christoph Lameter
@ 2016-01-24 16:57                                   ` Linus Torvalds
  -1 siblings, 0 replies; 114+ messages in thread
From: Linus Torvalds @ 2016-01-24 16:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Michal Hocko, Sasha Levin, LKML, linux-mm, Andrew Morton

On Fri, Jan 22, 2016 at 8:46 AM, Christoph Lameter <cl@linux.com> wrote:
>
> Subject: vmstat: Remove BUG_ON from vmstat_update
>
> If we detect that there is nothing to do just set the flag and do not check
> if it was already set before. [..]

Ok, I am assuming this is in Andrew's queue already, but this bug hit
my machine overnight, so I'm applying it directly..

                Linus

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408!
@ 2016-01-24 16:57                                   ` Linus Torvalds
  0 siblings, 0 replies; 114+ messages in thread
From: Linus Torvalds @ 2016-01-24 16:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Michal Hocko, Sasha Levin, LKML, linux-mm, Andrew Morton

On Fri, Jan 22, 2016 at 8:46 AM, Christoph Lameter <cl@linux.com> wrote:
>
> Subject: vmstat: Remove BUG_ON from vmstat_update
>
> If we detect that there is nothing to do just set the flag and do not check
> if it was already set before. [..]

Ok, I am assuming this is in Andrew's queue already, but this bug hit
my machine overnight, so I'm applying it directly..

                Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-23 16:21                                 ` fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) Mike Galbraith
  2016-01-24  0:33                                   ` Christoph Lameter
@ 2016-01-25 17:42                                   ` Michal Hocko
  2016-01-25 18:02                                     ` Christoph Lameter
  2016-01-27 16:48                                   ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) " Michal Hocko
  2 siblings, 1 reply; 114+ messages in thread
From: Michal Hocko @ 2016-01-25 17:42 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mike Galbraith, Peter Zijlstra, LKML

On Sat 23-01-16 17:21:55, Mike Galbraith wrote:
> Hi Christoph,
> 
> While you're fixing that commit up, can you perhaps find a better home
> for quiet_vmstat()?  It not only munches cycles when switching cross
> -core mightily, for -rt it injects a sleeping lock into the idle task.
> 
>     12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
>      4.75%  [kernel]       [k] __schedule                  
>      4.70%  [kernel]       [k] mutex_unlock                
>      3.14%  [kernel]       [k] __switch_to                 

Hmm, I wouldn't have expected that refresh_cpu_vm_stats could have
such a large footprint. I guess this would be just an expensive noop
because we have to check all the zones*counters and do an expensive
this_cpu_xchg. Is the whole deferred thing worth this overhead?

0eb77e988032 ("vmstat: make vmstat_updater deferrable again and
shut down on idle") doesn't talk about any numbers and neither does
39bf6270f524 ("VM statistics: Make timer deferrable").

But even when refresh_cpu_vm_stats is made more effective there would
still be the problem for RT and the work canceling, though. We would
need much more changes to make this RT ready (both timer and WQ code
would need some changes AFAIU).

Unless there is a clear and huge win from doing the vmstat update
deferrable then I think a revert is more appropriate IMHO.
-- 
Michal Hocko
SUSE Labs 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-25 17:42                                   ` Michal Hocko
@ 2016-01-25 18:02                                     ` Christoph Lameter
  2016-01-25 20:13                                       ` Michal Hocko
  2016-01-26  2:14                                       ` Mike Galbraith
  0 siblings, 2 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-25 18:02 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Mike Galbraith, Peter Zijlstra, LKML

On Mon, 25 Jan 2016, Michal Hocko wrote:

> On Sat 23-01-16 17:21:55, Mike Galbraith wrote:
> > Hi Christoph,
> >
> > While you're fixing that commit up, can you perhaps find a better home
> > for quiet_vmstat()?  It not only munches cycles when switching cross
> > -core mightily, for -rt it injects a sleeping lock into the idle task.
> >
> >     12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
> >      4.75%  [kernel]       [k] __schedule
> >      4.70%  [kernel]       [k] mutex_unlock
> >      3.14%  [kernel]       [k] __switch_to
>
> Hmm, I wouldn't have expected that refresh_cpu_vm_stats could have
> such a large footprint. I guess this would be just an expensive noop
> because we have to check all the zones*counters and do an expensive
> this_cpu_xchg. Is the whole deferred thing worth this overhead?

Why would the deferring cause this overhead?

Also there is no cross core activity from quiet_vmstat(). It simply
disables the local vmstat updates.

> Unless there is a clear and huge win from doing the vmstat update
> deferrable then I think a revert is more appropriate IMHO.

It reduces the OS events that the application experiences by folding it
into the tick events. If its not deferrable then a timer event will be
generated in addition to the tick. We do not want that.

Workqueues are used in many places. If RT can sleep within workqueue
management functions then spinlocks cannot be taken anymore and there may
be issues with preemption.

The regression that I know of (independent of "RT") is due as far as I
know due to the switch of the parameters of some vmstat functions to 64
bit instead of 32 bit.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-25 18:02                                     ` Christoph Lameter
@ 2016-01-25 20:13                                       ` Michal Hocko
  2016-01-26 16:25                                         ` Christoph Lameter
  2016-01-26  2:14                                       ` Mike Galbraith
  1 sibling, 1 reply; 114+ messages in thread
From: Michal Hocko @ 2016-01-25 20:13 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mike Galbraith, Peter Zijlstra, LKML

On Mon 25-01-16 12:02:06, Christoph Lameter wrote:
> On Mon, 25 Jan 2016, Michal Hocko wrote:
> 
> > On Sat 23-01-16 17:21:55, Mike Galbraith wrote:
> > > Hi Christoph,
> > >
> > > While you're fixing that commit up, can you perhaps find a better home
> > > for quiet_vmstat()?  It not only munches cycles when switching cross
> > > -core mightily, for -rt it injects a sleeping lock into the idle task.
> > >
> > >     12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
> > >      4.75%  [kernel]       [k] __schedule
> > >      4.70%  [kernel]       [k] mutex_unlock
> > >      3.14%  [kernel]       [k] __switch_to
> >
> > Hmm, I wouldn't have expected that refresh_cpu_vm_stats could have
> > such a large footprint. I guess this would be just an expensive noop
> > because we have to check all the zones*counters and do an expensive
> > this_cpu_xchg. Is the whole deferred thing worth this overhead?
> 
> Why would the deferring cause this overhead?

I guess the profile speaks for itself, doesn't it?

> Also there is no cross core activity from quiet_vmstat(). It simply
> disables the local vmstat updates.

It doesn't go cross core but it still does nr_zones * counters atomic
ops.

> > Unless there is a clear and huge win from doing the vmstat update
> > deferrable then I think a revert is more appropriate IMHO.
> 
> It reduces the OS events that the application experiences by folding it
> into the tick events. If its not deferrable then a timer event will be
> generated in addition to the tick. We do not want that.

Yes this is what I have read in the changelog. But "how much" part is
really missing. Is this even quantifiable?

> Workqueues are used in many places. If RT can sleep within workqueue
> management functions then spinlocks cannot be taken anymore and there may
> be issues with preemption.

RT can sleep in _any_ spinlock except for raw spin locks. Even though
the !RT kernel is not sleeping doesn't really matter much because
cancel_delayed_work is quite a heavy function which shouldn't be called
from the idle context AFAIU. Sure most of the time it will boil down to
del_timer but it can hit the slowpath as well if the timer got migrated
to a different CPU and we have to race with the WQ pool management IIUC.

Maybe this overhead can be reduced by outsourcing the functionality to
vmstat_shepherd which can check idle CPUs, cancel the timer for them
update the differentials and put them to cpu_stat_off? 

> The regression that I know of (independent of "RT") is due as far as I
> know due to the switch of the parameters of some vmstat functions to 64
> bit instead of 32 bit.

I am not sure I am following.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-25 18:02                                     ` Christoph Lameter
  2016-01-25 20:13                                       ` Michal Hocko
@ 2016-01-26  2:14                                       ` Mike Galbraith
  2016-01-26  2:25                                         ` Mike Galbraith
  2016-01-26 16:26                                         ` Christoph Lameter
  1 sibling, 2 replies; 114+ messages in thread
From: Mike Galbraith @ 2016-01-26  2:14 UTC (permalink / raw)
  To: Christoph Lameter, Michal Hocko; +Cc: Peter Zijlstra, LKML

On Mon, 2016-01-25 at 12:02 -0600, Christoph Lameter wrote:
> On Mon, 25 Jan 2016, Michal Hocko wrote:
> 
> > On Sat 23-01-16 17:21:55, Mike Galbraith wrote:
> > > Hi Christoph,
> > > 
> > > While you're fixing that commit up, can you perhaps find a better home
> > > for quiet_vmstat()?  It not only munches cycles when switching cross
> > > -core mightily, for -rt it injects a sleeping lock into the idle task.
> > > 
> > >     12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
> > >      4.75%  [kernel]       [k] __schedule
> > >      4.70%  [kernel]       [k] mutex_unlock
> > >      3.14%  [kernel]       [k] __switch_to
> > 
> > Hmm, I wouldn't have expected that refresh_cpu_vm_stats could have
> > such a large footprint. I guess this would be just an expensive noop
> > because we have to check all the zones*counters and do an expensive
> > this_cpu_xchg. Is the whole deferred thing worth this overhead?
> 
> Why would the deferring cause this overhead?

Because we schedule to idle cores aggressively, thus we may pop in and
out of idle at high frequency.

> Also there is no cross core activity from quiet_vmstat(). It simply
> disables the local vmstat updates.

Again, the cross core activity is not due to quiet_vmstat(), it is due
to pipe-test threads running on two cores, and meeting quiet_vmstat()
at high frequency.

> > Unless there is a clear and huge win from doing the vmstat update
> > deferrable then I think a revert is more appropriate IMHO.
> 
> It reduces the OS events that the application experiences by folding it
> into the tick events. If its not deferrable then a timer event will be
> generated in addition to the tick. We do not want that.

Perf and RT say we don't want quiet_vmstat() in the idle loop either.

	-Mike

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-26  2:14                                       ` Mike Galbraith
@ 2016-01-26  2:25                                         ` Mike Galbraith
  2016-01-26 16:26                                           ` Christoph Lameter
  2016-01-26 16:26                                         ` Christoph Lameter
  1 sibling, 1 reply; 114+ messages in thread
From: Mike Galbraith @ 2016-01-26  2:25 UTC (permalink / raw)
  To: Christoph Lameter, Michal Hocko; +Cc: Peter Zijlstra, LKML

On Tue, 2016-01-26 at 03:14 +0100, Mike Galbraith wrote:

> Perf and RT say we don't want quiet_vmstat() in the idle loop either.

BTW, the perf numbers were not from an RT kernel, they were from my
PREEMPT_VOLUNTARY desktop kernel.

	-Mike 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-25 20:13                                       ` Michal Hocko
@ 2016-01-26 16:25                                         ` Christoph Lameter
  2016-01-26 18:31                                           ` Mike Galbraith
  0 siblings, 1 reply; 114+ messages in thread
From: Christoph Lameter @ 2016-01-26 16:25 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Mike Galbraith, Peter Zijlstra, LKML

On Mon, 25 Jan 2016, Michal Hocko wrote:

> > Why would the deferring cause this overhead?
>
> I guess the profile speaks for itself, doesn't it?

But the system is going idle? Why would this impact performance?

> > Also there is no cross core activity from quiet_vmstat(). It simply
> > disables the local vmstat updates.
>
> It doesn't go cross core but it still does nr_zones * counters atomic
> ops.

If there are updates then yes.

> > It reduces the OS events that the application experiences by folding it
> > into the tick events. If its not deferrable then a timer event will be
> > generated in addition to the tick. We do not want that.
>
> Yes this is what I have read in the changelog. But "how much" part is
> really missing. Is this even quantifiable?

Oh yes. If you want to have low latency responses yes. And event like that
causes a couple of microseconds delay which will cause a moneytary impact
if you have to react to stock trading events.

> Maybe this overhead can be reduced by outsourcing the functionality to
> vmstat_shepherd which can check idle CPUs, cancel the timer for them
> update the differentials and put them to cpu_stat_off?

Remote updating of differentials is problematic due to the counters being
per cpu values that are expected to only be updated from the cpu that
"owns" them.

> > The regression that I know of (independent of "RT") is due as far as I
> > know due to the switch of the parameters of some vmstat functions to 64
> > bit instead of 32 bit.
>
> I am not sure I am following.

An additional patch was merged for 4.5 that increases the arguments to the
counter operations to 64 bit which is known for regressions.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-26  2:14                                       ` Mike Galbraith
  2016-01-26  2:25                                         ` Mike Galbraith
@ 2016-01-26 16:26                                         ` Christoph Lameter
  2016-01-26 17:08                                           ` Mike Galbraith
  1 sibling, 1 reply; 114+ messages in thread
From: Christoph Lameter @ 2016-01-26 16:26 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Michal Hocko, Peter Zijlstra, LKML

On Tue, 26 Jan 2016, Mike Galbraith wrote:

> > Why would the deferring cause this overhead?
>
> Because we schedule to idle cores aggressively, thus we may pop in and
> out of idle at high frequency.

Whats the point of going idle if you have things to do soon?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-26  2:25                                         ` Mike Galbraith
@ 2016-01-26 16:26                                           ` Christoph Lameter
  2016-01-26 17:39                                             ` Mike Galbraith
  0 siblings, 1 reply; 114+ messages in thread
From: Christoph Lameter @ 2016-01-26 16:26 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Michal Hocko, Peter Zijlstra, LKML

On Tue, 26 Jan 2016, Mike Galbraith wrote:

> On Tue, 2016-01-26 at 03:14 +0100, Mike Galbraith wrote:
>
> > Perf and RT say we don't want quiet_vmstat() in the idle loop either.
>
> BTW, the perf numbers were not from an RT kernel, they were from my
> PREEMPT_VOLUNTARY desktop kernel.

Can we move quiet_vmstat() elsewhere after we have checked that really
nothing else is going on soon?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-26 16:26                                         ` Christoph Lameter
@ 2016-01-26 17:08                                           ` Mike Galbraith
  2016-01-26 18:22                                             ` Christoph Lameter
  0 siblings, 1 reply; 114+ messages in thread
From: Mike Galbraith @ 2016-01-26 17:08 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Michal Hocko, Peter Zijlstra, LKML

On Tue, 2016-01-26 at 10:26 -0600, Christoph Lameter wrote:
> On Tue, 26 Jan 2016, Mike Galbraith wrote:
> 
> > > Why would the deferring cause this overhead?
> > 
> > Because we schedule to idle cores aggressively, thus we may pop in and
> > out of idle at high frequency.
> 
> Whats the point of going idle if you have things to do soon?

When a task schedules off, how do you know it'll be back at all, much
less soon?

	-Mike

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-26 16:26                                           ` Christoph Lameter
@ 2016-01-26 17:39                                             ` Mike Galbraith
  2016-01-26 18:19                                               ` Christoph Lameter
  0 siblings, 1 reply; 114+ messages in thread
From: Mike Galbraith @ 2016-01-26 17:39 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Michal Hocko, Peter Zijlstra, LKML

On Tue, 2016-01-26 at 10:26 -0600, Christoph Lameter wrote:
> On Tue, 26 Jan 2016, Mike Galbraith wrote:
> 
> > On Tue, 2016-01-26 at 03:14 +0100, Mike Galbraith wrote:
> > 
> > > Perf and RT say we don't want quiet_vmstat() in the idle loop
> > > either.
> > 
> > BTW, the perf numbers were not from an RT kernel, they were from my
> > PREEMPT_VOLUNTARY desktop kernel.
> 
> Can we move quiet_vmstat() elsewhere after we have checked that really
> nothing else is going on soon?

How would you check?  Precognition doesn't work for mortals.

	-Mike

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-26 17:39                                             ` Mike Galbraith
@ 2016-01-26 18:19                                               ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-26 18:19 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Michal Hocko, Peter Zijlstra, LKML

On Tue, 26 Jan 2016, Mike Galbraith wrote:

> On Tue, 2016-01-26 at 10:26 -0600, Christoph Lameter wrote:
> > On Tue, 26 Jan 2016, Mike Galbraith wrote:
> >
> > > On Tue, 2016-01-26 at 03:14 +0100, Mike Galbraith wrote:
> > >
> > > > Perf and RT say we don't want quiet_vmstat() in the idle loop
> > > > either.
> > >
> > > BTW, the perf numbers were not from an RT kernel, they were from my
> > > PREEMPT_VOLUNTARY desktop kernel.
> >
> > Can we move quiet_vmstat() elsewhere after we have checked that really
> > nothing else is going on soon?
>
> How would you check?  Precognition doesn't work for mortals.

Dont we have some decision mechanism to go into higher levels of power
savings when the system is idle for longer times?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-26 17:08                                           ` Mike Galbraith
@ 2016-01-26 18:22                                             ` Christoph Lameter
  2016-01-26 19:09                                               ` Mike Galbraith
  0 siblings, 1 reply; 114+ messages in thread
From: Christoph Lameter @ 2016-01-26 18:22 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Michal Hocko, Peter Zijlstra, LKML

On Tue, 26 Jan 2016, Mike Galbraith wrote:

> On Tue, 2016-01-26 at 10:26 -0600, Christoph Lameter wrote:
> > On Tue, 26 Jan 2016, Mike Galbraith wrote:
> >
> > > > Why would the deferring cause this overhead?
> > >
> > > Because we schedule to idle cores aggressively, thus we may pop in and
> > > out of idle at high frequency.
> >
> > Whats the point of going idle if you have things to do soon?
>
> When a task schedules off, how do you know it'll be back at all, much
> less soon?

Ok so you are running an artificial benchmark that always gets the
system running again when it decides to go idle?

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-26 16:25                                         ` Christoph Lameter
@ 2016-01-26 18:31                                           ` Mike Galbraith
  2016-01-26 18:34                                             ` Christoph Lameter
  0 siblings, 1 reply; 114+ messages in thread
From: Mike Galbraith @ 2016-01-26 18:31 UTC (permalink / raw)
  To: Christoph Lameter, Michal Hocko; +Cc: Peter Zijlstra, LKML

On Tue, 2016-01-26 at 10:25 -0600, Christoph Lameter wrote:
> On Mon, 25 Jan 2016, Michal Hocko wrote:
> 
> > > Why would the deferring cause this overhead?
> > 
> > I guess the profile speaks for itself, doesn't it?
> 
> But the system is going idle? Why would this impact performance?

We enter/exit idle a lot.

Your reluctance to move it seem to suggest that 99.99% of CPUs on the
planet chewing up cycles (measured) doing what for most is useless work
on every micro-idle is a perfectly fine price to pay to ensure that
.01% (or whatever tiny minority) get what they want.

I disagree.  You're burning electrons for no benefit at all to me on my
box.  You want to do high speed trading, that's fine, but I expect my
box to be able to pop in and out of idle without having to pay a toll
to the high speed trading bandits of the world, thank you very much.

This specialty thing does not belong in the generic fast path.

	-Mike

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-26 18:31                                           ` Mike Galbraith
@ 2016-01-26 18:34                                             ` Christoph Lameter
  2016-01-26 18:45                                               ` Mike Galbraith
  0 siblings, 1 reply; 114+ messages in thread
From: Christoph Lameter @ 2016-01-26 18:34 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Michal Hocko, Peter Zijlstra, LKML

On Tue, 26 Jan 2016, Mike Galbraith wrote:

> I disagree.  You're burning electrons for no benefit at all to me on my
> box.  You want to do high speed trading, that's fine, but I expect my
> box to be able to pop in and out of idle without having to pay a toll
> to the high speed trading bandits of the world, thank you very much.
>
> This specialty thing does not belong in the generic fast path.

The system going idle is a fastpath. Mind boogling.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-26 18:34                                             ` Christoph Lameter
@ 2016-01-26 18:45                                               ` Mike Galbraith
  2016-01-26 19:20                                                 ` Christoph Lameter
  0 siblings, 1 reply; 114+ messages in thread
From: Mike Galbraith @ 2016-01-26 18:45 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Michal Hocko, Peter Zijlstra, LKML

On Tue, 2016-01-26 at 12:34 -0600, Christoph Lameter wrote:
> On Tue, 26 Jan 2016, Mike Galbraith wrote:
> 
> > I disagree.  You're burning electrons for no benefit at all to me on my
> > box.  You want to do high speed trading, that's fine, but I expect my
> > box to be able to pop in and out of idle without having to pay a toll
> > to the high speed trading bandits of the world, thank you very much.
> > 
> > This specialty thing does not belong in the generic fast path.
> 
> The system going idle is a fastpath. Mind boogling.

Hohum, noted.  Now what about those cycles, and the sleeping lock you
injected for -rt?

	-Mike

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-26 18:22                                             ` Christoph Lameter
@ 2016-01-26 19:09                                               ` Mike Galbraith
  2016-01-26 19:22                                                 ` Christoph Lameter
  0 siblings, 1 reply; 114+ messages in thread
From: Mike Galbraith @ 2016-01-26 19:09 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Michal Hocko, Peter Zijlstra, LKML

On Tue, 2016-01-26 at 12:22 -0600, Christoph Lameter wrote:
> On Tue, 26 Jan 2016, Mike Galbraith wrote:
> 
> > On Tue, 2016-01-26 at 10:26 -0600, Christoph Lameter wrote:
> > > On Tue, 26 Jan 2016, Mike Galbraith wrote:
> > > 
> > > > > Why would the deferring cause this overhead?
> > > > 
> > > > Because we schedule to idle cores aggressively, thus we may pop
> > > > in and
> > > > out of idle at high frequency.
> > > 
> > > Whats the point of going idle if you have things to do soon?
> > 
> > When a task schedules off, how do you know it'll be back at all,
> > much
> > less soon?
> 
> Ok so you are running an artificial benchmark that always gets the
> system running again when it decides to go idle?

The benchmark does not alter the cycle expenditure per event.  The real
world will pay the toll less frequently than the artificial benchmark,
yes, but it will pay nonetheless, and for most, needlessly.  Or?

	-Mike

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-26 18:45                                               ` Mike Galbraith
@ 2016-01-26 19:20                                                 ` Christoph Lameter
  2016-01-27  3:12                                                   ` Mike Galbraith
  0 siblings, 1 reply; 114+ messages in thread
From: Christoph Lameter @ 2016-01-26 19:20 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Michal Hocko, Peter Zijlstra, LKML

On Tue, 26 Jan 2016, Mike Galbraith wrote:

> > The system going idle is a fastpath. Mind boogling.
>
> Hohum, noted.  Now what about those cycles, and the sleeping lock you
> injected for -rt?

Since we (the NOHZ people) care mostly about NOHZ then lets restrict
that to the NOHZ mode. Then it should not affect your load.


Subject: Move quiet_vmstat() to NOHZ code

quiet_vmstat() seems to cause regressions for some load because
the cpu going idle is a "fastpath". Mind boogling. Strange claim.
If the system goes idle then it has nothing to do after all

But anyways if we shift the quiet_vmstat() into the NOHZ logic
when it stops the tick then it will only affect those cores that
are setup for NOHZ mode. That is where we want this processing
after all to ensure that the OS keeps itself off those cores.

Signed-off-by: Christoph Lameter <cl@linux.com>


Index: linux/kernel/sched/idle.c
===================================================================
--- linux.orig/kernel/sched/idle.c
+++ linux/kernel/sched/idle.c
@@ -213,7 +213,6 @@ static void cpu_idle_loop(void)
 		 */

 		__current_set_polling();
-		quiet_vmstat();
 		tick_nohz_idle_enter();

 		while (!need_resched()) {
Index: linux/kernel/time/tick-sched.c
===================================================================
--- linux.orig/kernel/time/tick-sched.c
+++ linux/kernel/time/tick-sched.c
@@ -811,6 +811,7 @@ static void __tick_nohz_idle_enter(struc
 		ts->idle_calls++;

 		expires = tick_nohz_stop_sched_tick(ts, now, cpu);
+		quiet_vmstat();
 		if (expires.tv64 > 0LL) {
 			ts->idle_sleeps++;
 			ts->idle_expires = expires;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-26 19:09                                               ` Mike Galbraith
@ 2016-01-26 19:22                                                 ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-26 19:22 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Michal Hocko, Peter Zijlstra, LKML

On Tue, 26 Jan 2016, Mike Galbraith wrote:

> > Ok so you are running an artificial benchmark that always gets the
> > system running again when it decides to go idle?
>
> The benchmark does not alter the cycle expenditure per event.  The real
> world will pay the toll less frequently than the artificial benchmark,
> yes, but it will pay nonetheless, and for most, needlessly.  Or?

Well if it does not pay it there then it is going to pay for it later when
the vmstat_updater needs to cause a wakeup to fold the data. Also not
folding the vmstat updates increases the time that minor updates are
deferred and thus impacts the accuracy of the global vm counters. That
seems to have been important recently.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-26 19:20                                                 ` Christoph Lameter
@ 2016-01-27  3:12                                                   ` Mike Galbraith
  2016-01-27  4:15                                                     ` Mike Galbraith
  2016-01-27 16:28                                                     ` Christoph Lameter
  0 siblings, 2 replies; 114+ messages in thread
From: Mike Galbraith @ 2016-01-27  3:12 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Michal Hocko, Peter Zijlstra, LKML

Good morning,

On Tue, 2016-01-26 at 13:20 -0600, Christoph Lameter wrote:
> On Tue, 26 Jan 2016, Mike Galbraith wrote:
> 
> > > The system going idle is a fastpath. Mind boogling.
> > 
> > Hohum, noted.  Now what about those cycles, and the sleeping lock you
> > injected for -rt?
> 
> Since we (the NOHZ people) care mostly about NOHZ then lets restrict
> that to the NOHZ mode. Then it should not affect your load.

Tons of folks do have NO_HZ enabled (including me).  Isn't there a spot
somewhere in NO_HZ_FULL code where it can take up residence?  (one with
a tad lower maximum call frequency would be good, a nohz_full cpu isn't
necessarily being used for pure compute exclusively) 

> Subject: Move quiet_vmstat() to NOHZ code
> 
> quiet_vmstat() seems to cause regressions for some load because
> the cpu going idle is a "fastpath". Mind boogling. Strange claim.
> If the system goes idle then it has nothing to do after all
> 
> But anyways if we shift the quiet_vmstat() into the NOHZ logic
> when it stops the tick then it will only affect those cores that
> are setup for NOHZ mode. That is where we want this processing
> after all to ensure that the OS keeps itself off those cores.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> 
> Index: linux/kernel/sched/idle.c
> ===================================================================
> --- linux.orig/kernel/sched/idle.c
> +++ linux/kernel/sched/idle.c
> @@ -213,7 +213,6 @@ static void cpu_idle_loop(void)
>  		 */
> 
>  		__current_set_polling();
> -		quiet_vmstat();
>  		tick_nohz_idle_enter();
> 
>  		while (!need_resched()) {
> Index: linux/kernel/time/tick-sched.c
> ===================================================================
> --- linux.orig/kernel/time/tick-sched.c
> +++ linux/kernel/time/tick-sched.c
> @@ -811,6 +811,7 @@ static void __tick_nohz_idle_enter(struc
>  		ts->idle_calls++;
> 
>  		expires = tick_nohz_stop_sched_tick(ts, now, cpu);
> +		quiet_vmstat();
>  		if (expires.tv64 > 0LL) {
>  			ts->idle_sleeps++;
>  			ts->idle_expires = expires;

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-27  3:12                                                   ` Mike Galbraith
@ 2016-01-27  4:15                                                     ` Mike Galbraith
  2016-01-27 16:28                                                     ` Christoph Lameter
  1 sibling, 0 replies; 114+ messages in thread
From: Mike Galbraith @ 2016-01-27  4:15 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Michal Hocko, Peter Zijlstra, LKML

On Wed, 2016-01-27 at 04:12 +0100, Mike Galbraith wrote:
> Good morning,
> 
> On Tue, 2016-01-26 at 13:20 -0600, Christoph Lameter wrote:
> > On Tue, 26 Jan 2016, Mike Galbraith wrote:
> > 
> > > > The system going idle is a fastpath. Mind boogling.
> > > 
> > > Hohum, noted.  Now what about those cycles, and the sleeping lock you
> > > injected for -rt?
> > 
> > Since we (the NOHZ people) care mostly about NOHZ then lets restrict
> > that to the NOHZ mode. Then it should not affect your load.
> 
> Tons of folks do have NO_HZ enabled (including me).  Isn't there a spot
> somewhere in NO_HZ_FULL code where it can take up residence?  (one with
> a tad lower maximum call frequency would be good, a nohz_full cpu isn't
> necessarily being used for pure compute exclusively)

I forgot to mention that the spot you picked is called with irqs
disabled.

> > Subject: Move quiet_vmstat() to NOHZ code
> > 
> > quiet_vmstat() seems to cause regressions for some load because
> > the cpu going idle is a "fastpath". Mind boogling. Strange claim.
> > If the system goes idle then it has nothing to do after all

Hm, seems I also forgot to say "Hohum, noted..." again.

	-Mike

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-27  3:12                                                   ` Mike Galbraith
  2016-01-27  4:15                                                     ` Mike Galbraith
@ 2016-01-27 16:28                                                     ` Christoph Lameter
  2016-01-28 15:36                                                       ` Frederic Weisbecker
  1 sibling, 1 reply; 114+ messages in thread
From: Christoph Lameter @ 2016-01-27 16:28 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Michal Hocko, Mike Galbraith, Peter Zijlstra, LKML

On Wed, 27 Jan 2016, Mike Galbraith wrote:

> > Since we (the NOHZ people) care mostly about NOHZ then lets restrict
> > that to the NOHZ mode. Then it should not affect your load.
>
> Tons of folks do have NO_HZ enabled (including me).  Isn't there a spot
> somewhere in NO_HZ_FULL code where it can take up residence?  (one with
> a tad lower maximum call frequency would be good, a nohz_full cpu isn't
> necessarily being used for pure compute exclusively)

Frederic, any idea on where to put quiet_vmstat()?


> > Subject: Move quiet_vmstat() to NOHZ code
> >
> > quiet_vmstat() seems to cause regressions for some load because
> > the cpu going idle is a "fastpath". Mind boogling. Strange claim.
> > If the system goes idle then it has nothing to do after all
> >
> > But anyways if we shift the quiet_vmstat() into the NOHZ logic
> > when it stops the tick then it will only affect those cores that
> > are setup for NOHZ mode. That is where we want this processing
> > after all to ensure that the OS keeps itself off those cores.
> >
> > Signed-off-by: Christoph Lameter <cl@linux.com>
> >
> >
> > Index: linux/kernel/sched/idle.c
> > ===================================================================
> > --- linux.orig/kernel/sched/idle.c
> > +++ linux/kernel/sched/idle.c
> > @@ -213,7 +213,6 @@ static void cpu_idle_loop(void)
> >  		 */
> >
> >  		__current_set_polling();
> > -		quiet_vmstat();
> >  		tick_nohz_idle_enter();
> >
> >  		while (!need_resched()) {
> > Index: linux/kernel/time/tick-sched.c
> > ===================================================================
> > --- linux.orig/kernel/time/tick-sched.c
> > +++ linux/kernel/time/tick-sched.c
> > @@ -811,6 +811,7 @@ static void __tick_nohz_idle_enter(struc
> >  		ts->idle_calls++;
> >
> >  		expires = tick_nohz_stop_sched_tick(ts, now, cpu);
> > +		quiet_vmstat();
> >  		if (expires.tv64 > 0LL) {
> >  			ts->idle_sleeps++;
> >  			ts->idle_expires = expires;
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle)
  2016-01-23 16:21                                 ` fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) Mike Galbraith
  2016-01-24  0:33                                   ` Christoph Lameter
  2016-01-25 17:42                                   ` Michal Hocko
@ 2016-01-27 16:48                                   ` Michal Hocko
  2016-01-27 17:03                                     ` Mike Galbraith
  2016-01-27 18:26                                     ` Christoph Lameter
  2 siblings, 2 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-27 16:48 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Christoph Lameter, Peter Zijlstra, LKML

On Sat 23-01-16 17:21:55, Mike Galbraith wrote:
> Hi Christoph,
> 
> While you're fixing that commit up, can you perhaps find a better home
> for quiet_vmstat()?  It not only munches cycles when switching cross
> -core mightily, for -rt it injects a sleeping lock into the idle task.
> 
>     12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
>      4.75%  [kernel]       [k] __schedule                  
>      4.70%  [kernel]       [k] mutex_unlock                
>      3.14%  [kernel]       [k] __switch_to                 

What about the following fix?
---
>From c74a04c4fdfe1fa67933bb1ac83d3de0532aaab2 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 27 Jan 2016 17:24:22 +0100
Subject: [PATCH] mm, vmstat: make quiet_vmstat lighter

Mike has reported a considerable overhead of refresh_cpu_vm_stats from
the idle entry during pipe test:
    12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
     4.75%  [kernel]       [k] __schedule
     4.70%  [kernel]       [k] mutex_unlock
     3.14%  [kernel]       [k] __switch_to

This is caused by 0eb77e988032 ("vmstat: make vmstat_updater deferrable
again and shut down on idle") which has placed quiet_vmstat into
cpu_idle_loop. The main reason here seems to be that the idle entry has
to get over all zones and perform atomic operations for each vmstat
entry even though there might be no per cpu diffs. This is a pointless
overhead for _each_ idle entry.

Make sure that quiet_vmstat is as light as possible.

 First of all it doesn't make any sense to do any local sync if the
current cpu is already set in oncpu_stat_off because vmstat_update puts
itself there only if there is nothing to do.

Then we can check need_update which should be a cheap way to check for
potential per-cpu diffs and only then do refresh_cpu_vm_stats.

The original patch also did cancel_delayed_work which we are not doing
here. There are two reasons for that. Firstly cancel_delayed_work from
idle context will blow up on RT kernels (reported by Mike):
[    2.279582] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7
[    2.280444] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[    2.281316]  ffff88040b00d640 ffff88040b01fe10 ffffffff812d20e2 0000000000000000
[    2.282202]  ffff88040b01fe30 ffffffff81081095 ffff88041ec4cee0 ffff88041ec501e0
[    2.283073]  ffff88040b01fe48 ffffffff815ff910 ffff88041ec4cee0 ffff88040b01fe88
[    2.283941] Call Trace:
[    2.284797]  [<ffffffff812d20e2>] dump_stack+0x49/0x67
[    2.285658]  [<ffffffff81081095>] ___might_sleep+0xf5/0x180
[    2.286521]  [<ffffffff815ff910>] rt_spin_lock+0x20/0x50
[    2.287382]  [<ffffffff81075919>] try_to_grab_pending+0x69/0x240
[    2.288239]  [<ffffffff81075b16>] cancel_delayed_work+0x26/0xe0
[    2.289094]  [<ffffffff8115ec05>] quiet_vmstat+0x75/0xa0
[    2.289949]  [<ffffffff8109ab38>] cpu_idle_loop+0x38/0x3e0
[    2.290800]  [<ffffffff8109aef3>] cpu_startup_entry+0x13/0x20
[    2.291647]  [<ffffffff81036164>] start_secondary+0x114/0x140

And secondly, even on !RT kernels it might add some non trivial overhead
which is not necessary. Even if the vmstat worker wakes up and preempts
idle then it will be most likely a single shot noop because the stats
were already synced and so it would end up on the oncpu_stat_off anyway.
We just need to teach both vmstat_shepherd and vmstat_update to stop
scheduling the worker if there is nothing to do.

Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/vmstat.c | 60 ++++++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 40 insertions(+), 20 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 40b2c74ddf16..7747f85537b6 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1396,10 +1396,15 @@ static void vmstat_update(struct work_struct *w)
 		 * Counters were updated so we expect more updates
 		 * to occur in the future. Keep on running the
 		 * update worker thread.
+		 * If we were marked on cpu_stat_off clear the flag
+		 * so that vmstat_shepherd doesn't schedule us again.
 		 */
-		queue_delayed_work_on(smp_processor_id(), vmstat_wq,
-			this_cpu_ptr(&vmstat_work),
-			round_jiffies_relative(sysctl_stat_interval));
+		if (!cpumask_test_and_clear_cpu(smp_processor_id(),
+						cpu_stat_off)) {
+			queue_delayed_work_on(smp_processor_id(), vmstat_wq,
+				this_cpu_ptr(&vmstat_work),
+				round_jiffies_relative(sysctl_stat_interval));
+		}
 	} else {
 		/*
 		 * We did not update any counters so the app may be in
@@ -1417,18 +1422,6 @@ static void vmstat_update(struct work_struct *w)
  * until the diffs stay at zero. The function is used by NOHZ and can only be
  * invoked when tick processing is not active.
  */
-void quiet_vmstat(void)
-{
-	if (system_state != SYSTEM_RUNNING)
-		return;
-
-	do {
-		if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
-			cancel_delayed_work(this_cpu_ptr(&vmstat_work));
-
-	} while (refresh_cpu_vm_stats(false));
-}
-
 /*
  * Check if the diffs for a certain cpu indicate that
  * an update is needed.
@@ -1452,6 +1445,30 @@ static bool need_update(int cpu)
 	return false;
 }
 
+void quiet_vmstat(void)
+{
+	if (system_state != SYSTEM_RUNNING)
+		return;
+
+	/*
+	 * If we are already in hands of the shepherd then there
+	 * is nothing for us to do here.
+	 */
+	if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
+		return;
+
+	if (!need_update(smp_processor_id()))
+		return;
+
+	/*
+	 * Just refresh counters and do not care about the pending delayed
+	 * vmstat_update. It doesn't fire that often to matter and canceling
+	 * it would be too expensive from this path.
+	 * vmstat_shepherd will take care about that for us.
+	 */
+	refresh_cpu_vm_stats(false);
+}
+
 
 /*
  * Shepherd worker thread that checks the
@@ -1470,11 +1487,14 @@ static void vmstat_shepherd(struct work_struct *w)
 	get_online_cpus();
 	/* Check processors whose vmstat worker threads have been disabled */
 	for_each_cpu(cpu, cpu_stat_off)
-		if (need_update(cpu) &&
-			cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
-
-			queue_delayed_work_on(cpu, vmstat_wq,
-				&per_cpu(vmstat_work, cpu), 0);
+		if (need_update(cpu)) {
+			if (cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
+				queue_delayed_work_on(cpu, vmstat_wq,
+					&per_cpu(vmstat_work, cpu), 0);
+		} else {
+			cpumask_set_cpu(cpu, cpu_stat_off);
+			cancel_delayed_work(this_cpu_ptr(&vmstat_work));
+		}
 
 	put_online_cpus();
 
-- 
2.7.0.rc3

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle)
  2016-01-27 16:48                                   ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) " Michal Hocko
@ 2016-01-27 17:03                                     ` Mike Galbraith
  2016-01-27 18:26                                     ` Christoph Lameter
  1 sibling, 0 replies; 114+ messages in thread
From: Mike Galbraith @ 2016-01-27 17:03 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Christoph Lameter, Peter Zijlstra, LKML

On Wed, 2016-01-27 at 17:48 +0100, Michal Hocko wrote:
> On Sat 23-01-16 17:21:55, Mike Galbraith wrote:
> > Hi Christoph,
> > 
> > While you're fixing that commit up, can you perhaps find a better
> > home
> > for quiet_vmstat()?  It not only munches cycles when switching
> > cross
> > -core mightily, for -rt it injects a sleeping lock into the idle
> > task.
> > 
> >     12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
> >      4.75%  [kernel]       [k] __schedule                  
> >      4.70%  [kernel]       [k] mutex_unlock                
> >      3.14%  [kernel]       [k] __switch_to                 
> 
> What about the following fix?

I haven't done _extensive_ testing, but 4.5-rc1-rt (with NO_HZ_FULL
enabling hacks restored, as it's otherwise a dead .config item) is
gripe free whether nohz_full CPUs are in use or not, and overhead went
away in it and desktop configured master kernel (PREEMPT_VOLUNTARY).

> ---
> From c74a04c4fdfe1fa67933bb1ac83d3de0532aaab2 Mon Sep 17 00:00:00
> 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Wed, 27 Jan 2016 17:24:22 +0100
> Subject: [PATCH] mm, vmstat: make quiet_vmstat lighter
> 
> Mike has reported a considerable overhead of refresh_cpu_vm_stats
> from
> the idle entry during pipe test:
>     12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
>      4.75%  [kernel]       [k] __schedule
>      4.70%  [kernel]       [k] mutex_unlock
>      3.14%  [kernel]       [k] __switch_to
> 
> This is caused by 0eb77e988032 ("vmstat: make vmstat_updater
> deferrable
> again and shut down on idle") which has placed quiet_vmstat into
> cpu_idle_loop. The main reason here seems to be that the idle entry
> has
> to get over all zones and perform atomic operations for each vmstat
> entry even though there might be no per cpu diffs. This is a
> pointless
> overhead for _each_ idle entry.
> 
> Make sure that quiet_vmstat is as light as possible.
> 
>  First of all it doesn't make any sense to do any local sync if the
> current cpu is already set in oncpu_stat_off because vmstat_update
> puts
> itself there only if there is nothing to do.
> 
> Then we can check need_update which should be a cheap way to check
> for
> potential per-cpu diffs and only then do refresh_cpu_vm_stats.
> 
> The original patch also did cancel_delayed_work which we are not
> doing
> here. There are two reasons for that. Firstly cancel_delayed_work
> from
> idle context will blow up on RT kernels (reported by Mike):
> [    2.279582] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7
> [    2.280444] Hardware name: MEDION MS-7848/MS-7848, BIOS
> M7848W08.20C 09/23/2013
> [    2.281316]  ffff88040b00d640 ffff88040b01fe10 ffffffff812d20e2
> 0000000000000000
> [    2.282202]  ffff88040b01fe30 ffffffff81081095 ffff88041ec4cee0
> ffff88041ec501e0
> [    2.283073]  ffff88040b01fe48 ffffffff815ff910 ffff88041ec4cee0
> ffff88040b01fe88
> [    2.283941] Call Trace:
> [    2.284797]  [<ffffffff812d20e2>] dump_stack+0x49/0x67
> [    2.285658]  [<ffffffff81081095>] ___might_sleep+0xf5/0x180
> [    2.286521]  [<ffffffff815ff910>] rt_spin_lock+0x20/0x50
> [    2.287382]  [<ffffffff81075919>] try_to_grab_pending+0x69/0x240
> [    2.288239]  [<ffffffff81075b16>] cancel_delayed_work+0x26/0xe0
> [    2.289094]  [<ffffffff8115ec05>] quiet_vmstat+0x75/0xa0
> [    2.289949]  [<ffffffff8109ab38>] cpu_idle_loop+0x38/0x3e0
> [    2.290800]  [<ffffffff8109aef3>] cpu_startup_entry+0x13/0x20
> [    2.291647]  [<ffffffff81036164>] start_secondary+0x114/0x140
> 
> And secondly, even on !RT kernels it might add some non trivial
> overhead
> which is not necessary. Even if the vmstat worker wakes up and
> preempts
> idle then it will be most likely a single shot noop because the stats
> were already synced and so it would end up on the oncpu_stat_off
> anyway.
> We just need to teach both vmstat_shepherd and vmstat_update to stop
> scheduling the worker if there is nothing to do.
> 
> Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/vmstat.c | 60 ++++++++++++++++++++++++++++++++++++++++-----------
> ---------
>  1 file changed, 40 insertions(+), 20 deletions(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 40b2c74ddf16..7747f85537b6 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1396,10 +1396,15 @@ static void vmstat_update(struct work_struct
> *w)
>  		 * Counters were updated so we expect more updates
>  		 * to occur in the future. Keep on running the
>  		 * update worker thread.
> +		 * If we were marked on cpu_stat_off clear the flag
> +		 * so that vmstat_shepherd doesn't schedule us
> again.
>  		 */
> -		queue_delayed_work_on(smp_processor_id(), vmstat_wq,
> -			this_cpu_ptr(&vmstat_work),
> -			round_jiffies_relative(sysctl_stat_interval)
> );
> +		if (!cpumask_test_and_clear_cpu(smp_processor_id(),
> +						cpu_stat_off)) {
> +			queue_delayed_work_on(smp_processor_id(),
> vmstat_wq,
> +				this_cpu_ptr(&vmstat_work),
> +				round_jiffies_relative(sysctl_stat_i
> nterval));
> +		}
>  	} else {
>  		/*
>  		 * We did not update any counters so the app may be
> in
> @@ -1417,18 +1422,6 @@ static void vmstat_update(struct work_struct
> *w)
>   * until the diffs stay at zero. The function is used by NOHZ and
> can only be
>   * invoked when tick processing is not active.
>   */
> -void quiet_vmstat(void)
> -{
> -	if (system_state != SYSTEM_RUNNING)
> -		return;
> -
> -	do {
> -		if (!cpumask_test_and_set_cpu(smp_processor_id(),
> cpu_stat_off))
> -			cancel_delayed_work(this_cpu_ptr(&vmstat_wor
> k));
> -
> -	} while (refresh_cpu_vm_stats(false));
> -}
> -
>  /*
>   * Check if the diffs for a certain cpu indicate that
>   * an update is needed.
> @@ -1452,6 +1445,30 @@ static bool need_update(int cpu)
>  	return false;
>  }
>  
> +void quiet_vmstat(void)
> +{
> +	if (system_state != SYSTEM_RUNNING)
> +		return;
> +
> +	/*
> +	 * If we are already in hands of the shepherd then there
> +	 * is nothing for us to do here.
> +	 */
> +	if (cpumask_test_and_set_cpu(smp_processor_id(),
> cpu_stat_off))
> +		return;
> +
> +	if (!need_update(smp_processor_id()))
> +		return;
> +
> +	/*
> +	 * Just refresh counters and do not care about the pending
> delayed
> +	 * vmstat_update. It doesn't fire that often to matter and
> canceling
> +	 * it would be too expensive from this path.
> +	 * vmstat_shepherd will take care about that for us.
> +	 */
> +	refresh_cpu_vm_stats(false);
> +}
> +
>  
>  /*
>   * Shepherd worker thread that checks the
> @@ -1470,11 +1487,14 @@ static void vmstat_shepherd(struct
> work_struct *w)
>  	get_online_cpus();
>  	/* Check processors whose vmstat worker threads have been
> disabled */
>  	for_each_cpu(cpu, cpu_stat_off)
> -		if (need_update(cpu) &&
> -			cpumask_test_and_clear_cpu(cpu,
> cpu_stat_off))
> -
> -			queue_delayed_work_on(cpu, vmstat_wq,
> -				&per_cpu(vmstat_work, cpu), 0);
> +		if (need_update(cpu)) {
> +			if (cpumask_test_and_clear_cpu(cpu,
> cpu_stat_off))
> +				queue_delayed_work_on(cpu,
> vmstat_wq,
> +					&per_cpu(vmstat_work, cpu),
> 0);
> +		} else {
> +			cpumask_set_cpu(cpu, cpu_stat_off);
> +			cancel_delayed_work(this_cpu_ptr(&vmstat_wor
> k));
> +		}
>  
>  	put_online_cpus();
>  
> -- 
> 2.7.0.rc3
> 

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle)
  2016-01-27 16:48                                   ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) " Michal Hocko
  2016-01-27 17:03                                     ` Mike Galbraith
@ 2016-01-27 18:26                                     ` Christoph Lameter
  2016-01-28 15:21                                       ` Michal Hocko
  2016-01-28 15:31                                       ` Michal Hocko
  1 sibling, 2 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-27 18:26 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Mike Galbraith, Peter Zijlstra, LKML

On Wed, 27 Jan 2016, Michal Hocko wrote:

> This is caused by 0eb77e988032 ("vmstat: make vmstat_updater deferrable
> again and shut down on idle") which has placed quiet_vmstat into
> cpu_idle_loop. The main reason here seems to be that the idle entry has
> to get over all zones and perform atomic operations for each vmstat
> entry even though there might be no per cpu diffs. This is a pointless
> overhead for _each_ idle entry.
>
> Make sure that quiet_vmstat is as light as possible.
>
>  First of all it doesn't make any sense to do any local sync if the
> current cpu is already set in oncpu_stat_off because vmstat_update puts
> itself there only if there is nothing to do.

But there might be something to do that happened in the meantime because
counters were incremented. It needs to be checked. The shepherd can do
that but it will delay the folding of diffs for awhile.

> +void quiet_vmstat(void)
> +{
> +	if (system_state != SYSTEM_RUNNING)
> +		return;
> +
> +	/*
> +	 * If we are already in hands of the shepherd then there
> +	 * is nothing for us to do here.
> +	 */
> +	if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
> +		return;
> +
> +	if (!need_update(smp_processor_id()))
> +		return;
> +
> +	/*
> +	 * Just refresh counters and do not care about the pending delayed
> +	 * vmstat_update. It doesn't fire that often to matter and canceling
> +	 * it would be too expensive from this path.
> +	 * vmstat_shepherd will take care about that for us.
> +	 */
> +	refresh_cpu_vm_stats(false);
> +}

The problem here is that there will be an additional tick generated on
idle. This is an issue for power because now the processor has to
needlessly wake up again, do tick processing etc  just to effectively do a
cancel_delayed_work().

The cancelling of the work request is required to avoid this additonal
tick.

> @@ -1470,11 +1487,14 @@ static void vmstat_shepherd(struct work_struct *w)
>  	get_online_cpus();
>  	/* Check processors whose vmstat worker threads have been disabled */
>  	for_each_cpu(cpu, cpu_stat_off)
> -		if (need_update(cpu) &&
> -			cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
> -
> -			queue_delayed_work_on(cpu, vmstat_wq,
> -				&per_cpu(vmstat_work, cpu), 0);
> +		if (need_update(cpu)) {
> +			if (cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
> +				queue_delayed_work_on(cpu, vmstat_wq,
> +					&per_cpu(vmstat_work, cpu), 0);
> +		} else {
> +			cpumask_set_cpu(cpu, cpu_stat_off);

Umm the flag is already set right? This just causes a bouncing cacheline
since we are accessing a global set of cpus,. Drop this line?

> +			cancel_delayed_work(this_cpu_ptr(&vmstat_work));

Ok so this is to move the cancel into the shepherd.

Aside from the two issues mentioned this looks good.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle)
  2016-01-27 18:26                                     ` Christoph Lameter
@ 2016-01-28 15:21                                       ` Michal Hocko
  2016-01-28 16:40                                         ` Christoph Lameter
  2016-01-28 15:31                                       ` Michal Hocko
  1 sibling, 1 reply; 114+ messages in thread
From: Michal Hocko @ 2016-01-28 15:21 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mike Galbraith, Peter Zijlstra, LKML

On Wed 27-01-16 12:26:16, Christoph Lameter wrote:
> On Wed, 27 Jan 2016, Michal Hocko wrote:
> 
> > This is caused by 0eb77e988032 ("vmstat: make vmstat_updater deferrable
> > again and shut down on idle") which has placed quiet_vmstat into
> > cpu_idle_loop. The main reason here seems to be that the idle entry has
> > to get over all zones and perform atomic operations for each vmstat
> > entry even though there might be no per cpu diffs. This is a pointless
> > overhead for _each_ idle entry.
> >
> > Make sure that quiet_vmstat is as light as possible.
> >
> >  First of all it doesn't make any sense to do any local sync if the
> > current cpu is already set in oncpu_stat_off because vmstat_update puts
> > itself there only if there is nothing to do.
> 
> But there might be something to do that happened in the meantime because
> counters were incremented. It needs to be checked. The shepherd can do
> that but it will delay the folding of diffs for awhile.

shepherd ticks with the same period so it should run relatively soon.
What is important here the test_and_set is the fast path which will
trigger a lot of the current CPU is mostly running in the userspace.

> > +void quiet_vmstat(void)
> > +{
> > +	if (system_state != SYSTEM_RUNNING)
> > +		return;
> > +
> > +	/*
> > +	 * If we are already in hands of the shepherd then there
> > +	 * is nothing for us to do here.
> > +	 */
> > +	if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
> > +		return;
> > +
> > +	if (!need_update(smp_processor_id()))
> > +		return;
> > +
> > +	/*
> > +	 * Just refresh counters and do not care about the pending delayed
> > +	 * vmstat_update. It doesn't fire that often to matter and canceling
> > +	 * it would be too expensive from this path.
> > +	 * vmstat_shepherd will take care about that for us.
> > +	 */
> > +	refresh_cpu_vm_stats(false);
> > +}
> 
> The problem here is that there will be an additional tick generated on
> idle. This is an issue for power because now the processor has to
> needlessly wake up again, do tick processing etc  just to effectively do a
> cancel_delayed_work().
> 
> The cancelling of the work request is required to avoid this additonal
> tick.

Yes but as mentioned in the changelog calling cancel_delayed_work is
quite expensive to be run on each idle entry slow path. In addition it
is RT unfriendly. So considering a single tick vs. other reasons I think
this is a reasonable compromise.

> > @@ -1470,11 +1487,14 @@ static void vmstat_shepherd(struct work_struct *w)
> >  	get_online_cpus();
> >  	/* Check processors whose vmstat worker threads have been disabled */
> >  	for_each_cpu(cpu, cpu_stat_off)
> > -		if (need_update(cpu) &&
> > -			cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
> > -
> > -			queue_delayed_work_on(cpu, vmstat_wq,
> > -				&per_cpu(vmstat_work, cpu), 0);
> > +		if (need_update(cpu)) {
> > +			if (cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
> > +				queue_delayed_work_on(cpu, vmstat_wq,
> > +					&per_cpu(vmstat_work, cpu), 0);
> > +		} else {
> > +			cpumask_set_cpu(cpu, cpu_stat_off);
> 
> Umm the flag is already set right? This just causes a bouncing cacheline
> since we are accessing a global set of cpus,. Drop this line?

Ohh, right you are. I will remove this.

> 
> > +			cancel_delayed_work(this_cpu_ptr(&vmstat_work));
> 
> Ok so this is to move the cancel into the shepherd.

Yeah, I have added a comment to explain that.

> Aside from the two issues mentioned this looks good.

Updated version:
---
>From 1d0febff044382b88ff44d245b5afe1a6e402c62 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 27 Jan 2016 17:24:22 +0100
Subject: [PATCH] mm, vmstat: make quiet_vmstat lighter

Mike has reported a considerable overhead of refresh_cpu_vm_stats from
the idle entry during pipe test:
    12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
     4.75%  [kernel]       [k] __schedule
     4.70%  [kernel]       [k] mutex_unlock
     3.14%  [kernel]       [k] __switch_to

This is caused by 0eb77e988032 ("vmstat: make vmstat_updater deferrable
again and shut down on idle") which has placed quiet_vmstat into
cpu_idle_loop. The main reason here seems to be that the idle entry has
to get over all zones and perform atomic operations for each vmstat
entry even though there might be no per cpu diffs. This is a pointless
overhead for _each_ idle entry.

Make sure that quiet_vmstat is as light as possible.

 First of all it doesn't make any sense to do any local sync if the
current cpu is already set in oncpu_stat_off because vmstat_update puts
itself there only if there is nothing to do.

Then we can check need_update which should be a cheap way to check for
potential per-cpu diffs and only then do refresh_cpu_vm_stats.

The original patch also did cancel_delayed_work which we are not doing
here. There are two reasons for that. Firstly cancel_delayed_work from
idle context will blow up on RT kernels (reported by Mike):
[    2.279582] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7
[    2.280444] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[    2.281316]  ffff88040b00d640 ffff88040b01fe10 ffffffff812d20e2 0000000000000000
[    2.282202]  ffff88040b01fe30 ffffffff81081095 ffff88041ec4cee0 ffff88041ec501e0
[    2.283073]  ffff88040b01fe48 ffffffff815ff910 ffff88041ec4cee0 ffff88040b01fe88
[    2.283941] Call Trace:
[    2.284797]  [<ffffffff812d20e2>] dump_stack+0x49/0x67
[    2.285658]  [<ffffffff81081095>] ___might_sleep+0xf5/0x180
[    2.286521]  [<ffffffff815ff910>] rt_spin_lock+0x20/0x50
[    2.287382]  [<ffffffff81075919>] try_to_grab_pending+0x69/0x240
[    2.288239]  [<ffffffff81075b16>] cancel_delayed_work+0x26/0xe0
[    2.289094]  [<ffffffff8115ec05>] quiet_vmstat+0x75/0xa0
[    2.289949]  [<ffffffff8109ab38>] cpu_idle_loop+0x38/0x3e0
[    2.290800]  [<ffffffff8109aef3>] cpu_startup_entry+0x13/0x20
[    2.291647]  [<ffffffff81036164>] start_secondary+0x114/0x140

And secondly, even on !RT kernels it might add some non trivial overhead
which is not necessary. Even if the vmstat worker wakes up and preempts
idle then it will be most likely a single shot noop because the stats
were already synced and so it would end up on the oncpu_stat_off anyway.
We just need to teach both vmstat_shepherd and vmstat_update to stop
scheduling the worker if there is nothing to do.

Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/vmstat.c | 64 ++++++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 44 insertions(+), 20 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 40b2c74ddf16..eb30bf45bd55 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1396,10 +1396,15 @@ static void vmstat_update(struct work_struct *w)
 		 * Counters were updated so we expect more updates
 		 * to occur in the future. Keep on running the
 		 * update worker thread.
+		 * If we were marked on cpu_stat_off clear the flag
+		 * so that vmstat_shepherd doesn't schedule us again.
 		 */
-		queue_delayed_work_on(smp_processor_id(), vmstat_wq,
-			this_cpu_ptr(&vmstat_work),
-			round_jiffies_relative(sysctl_stat_interval));
+		if (!cpumask_test_and_clear_cpu(smp_processor_id(),
+						cpu_stat_off)) {
+			queue_delayed_work_on(smp_processor_id(), vmstat_wq,
+				this_cpu_ptr(&vmstat_work),
+				round_jiffies_relative(sysctl_stat_interval));
+		}
 	} else {
 		/*
 		 * We did not update any counters so the app may be in
@@ -1417,18 +1422,6 @@ static void vmstat_update(struct work_struct *w)
  * until the diffs stay at zero. The function is used by NOHZ and can only be
  * invoked when tick processing is not active.
  */
-void quiet_vmstat(void)
-{
-	if (system_state != SYSTEM_RUNNING)
-		return;
-
-	do {
-		if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
-			cancel_delayed_work(this_cpu_ptr(&vmstat_work));
-
-	} while (refresh_cpu_vm_stats(false));
-}
-
 /*
  * Check if the diffs for a certain cpu indicate that
  * an update is needed.
@@ -1452,6 +1445,30 @@ static bool need_update(int cpu)
 	return false;
 }
 
+void quiet_vmstat(void)
+{
+	if (system_state != SYSTEM_RUNNING)
+		return;
+
+	/*
+	 * If we are already in hands of the shepherd then there
+	 * is nothing for us to do here.
+	 */
+	if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
+		return;
+
+	if (!need_update(smp_processor_id()))
+		return;
+
+	/*
+	 * Just refresh counters and do not care about the pending delayed
+	 * vmstat_update. It doesn't fire that often to matter and canceling
+	 * it would be too expensive from this path.
+	 * vmstat_shepherd will take care about that for us.
+	 */
+	refresh_cpu_vm_stats(false);
+}
+
 
 /*
  * Shepherd worker thread that checks the
@@ -1470,11 +1487,18 @@ static void vmstat_shepherd(struct work_struct *w)
 	get_online_cpus();
 	/* Check processors whose vmstat worker threads have been disabled */
 	for_each_cpu(cpu, cpu_stat_off)
-		if (need_update(cpu) &&
-			cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
-
-			queue_delayed_work_on(cpu, vmstat_wq,
-				&per_cpu(vmstat_work, cpu), 0);
+		if (need_update(cpu)) {
+			if (cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
+				queue_delayed_work_on(cpu, vmstat_wq,
+					&per_cpu(vmstat_work, cpu), 0);
+		} else {
+			/*
+			 * Cancel the work if quiet_vmstat has put this
+			 * cpu on cpu_stat_off because the work item might
+			 * be still scheduled
+			 */
+			cancel_delayed_work(this_cpu_ptr(&vmstat_work));
+		}
 
 	put_online_cpus();
 
-- 
2.7.0.rc3


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle)
  2016-01-27 18:26                                     ` Christoph Lameter
  2016-01-28 15:21                                       ` Michal Hocko
@ 2016-01-28 15:31                                       ` Michal Hocko
  2016-01-28 15:37                                         ` [PATCH] vmstat: make vmstat_update deferrable (was: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter) " Michal Hocko
  2016-01-28 16:42                                         ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast " Christoph Lameter
  1 sibling, 2 replies; 114+ messages in thread
From: Michal Hocko @ 2016-01-28 15:31 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mike Galbraith, Peter Zijlstra, LKML

On Wed 27-01-16 12:26:16, Christoph Lameter wrote:
> On Wed, 27 Jan 2016, Michal Hocko wrote:
[...]
> > +void quiet_vmstat(void)
> > +{
> > +	if (system_state != SYSTEM_RUNNING)
> > +		return;
> > +
> > +	/*
> > +	 * If we are already in hands of the shepherd then there
> > +	 * is nothing for us to do here.
> > +	 */
> > +	if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
> > +		return;
> > +
> > +	if (!need_update(smp_processor_id()))
> > +		return;
> > +
> > +	/*
> > +	 * Just refresh counters and do not care about the pending delayed
> > +	 * vmstat_update. It doesn't fire that often to matter and canceling
> > +	 * it would be too expensive from this path.
> > +	 * vmstat_shepherd will take care about that for us.
> > +	 */
> > +	refresh_cpu_vm_stats(false);
> > +}
> 
> The problem here is that there will be an additional tick generated on
> idle. This is an issue for power because now the processor has to
> needlessly wake up again, do tick processing etc  just to effectively do a
> cancel_delayed_work().

Thinking about it some more, making vmstat_update deferrable should help
to not interrupt idle no?
---
diff --git a/mm/vmstat.c b/mm/vmstat.c
index eb30bf45bd55..69537d2be6f6 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1512,7 +1512,7 @@ static void __init start_shepherd_timer(void)
 	int cpu;
 
 	for_each_possible_cpu(cpu)
-		INIT_DELAYED_WORK(per_cpu_ptr(&vmstat_work, cpu),
+		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
 			vmstat_update);
 
 	if (!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL))
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-27 16:28                                                     ` Christoph Lameter
@ 2016-01-28 15:36                                                       ` Frederic Weisbecker
  2016-01-28 16:42                                                         ` Christoph Lameter
  0 siblings, 1 reply; 114+ messages in thread
From: Frederic Weisbecker @ 2016-01-28 15:36 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Michal Hocko, Mike Galbraith, Peter Zijlstra, LKML

On Wed, Jan 27, 2016 at 10:28:48AM -0600, Christoph Lameter wrote:
> On Wed, 27 Jan 2016, Mike Galbraith wrote:
> 
> > > Since we (the NOHZ people) care mostly about NOHZ then lets restrict
> > > that to the NOHZ mode. Then it should not affect your load.
> >
> > Tons of folks do have NO_HZ enabled (including me).  Isn't there a spot
> > somewhere in NO_HZ_FULL code where it can take up residence?  (one with
> > a tad lower maximum call frequency would be good, a nohz_full cpu isn't
> > necessarily being used for pure compute exclusively)
> 
> Frederic, any idea on where to put quiet_vmstat()?

I guess those who want this also want task isolation, right? And Chris integrates
it in his patchset.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH] vmstat: make vmstat_update deferrable (was: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter) path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle)
  2016-01-28 15:31                                       ` Michal Hocko
@ 2016-01-28 15:37                                         ` Michal Hocko
  2016-01-28 16:48                                           ` Christoph Lameter
  2016-01-28 16:42                                         ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast " Christoph Lameter
  1 sibling, 1 reply; 114+ messages in thread
From: Michal Hocko @ 2016-01-28 15:37 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mike Galbraith, Peter Zijlstra, LKML

On Thu 28-01-16 16:31:11, Michal Hocko wrote:
> On Wed 27-01-16 12:26:16, Christoph Lameter wrote:
> > On Wed, 27 Jan 2016, Michal Hocko wrote:
> [...]
> > > +void quiet_vmstat(void)
> > > +{
> > > +	if (system_state != SYSTEM_RUNNING)
> > > +		return;
> > > +
> > > +	/*
> > > +	 * If we are already in hands of the shepherd then there
> > > +	 * is nothing for us to do here.
> > > +	 */
> > > +	if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
> > > +		return;
> > > +
> > > +	if (!need_update(smp_processor_id()))
> > > +		return;
> > > +
> > > +	/*
> > > +	 * Just refresh counters and do not care about the pending delayed
> > > +	 * vmstat_update. It doesn't fire that often to matter and canceling
> > > +	 * it would be too expensive from this path.
> > > +	 * vmstat_shepherd will take care about that for us.
> > > +	 */
> > > +	refresh_cpu_vm_stats(false);
> > > +}
> > 
> > The problem here is that there will be an additional tick generated on
> > idle. This is an issue for power because now the processor has to
> > needlessly wake up again, do tick processing etc  just to effectively do a
> > cancel_delayed_work().
> 
> Thinking about it some more, making vmstat_update deferrable should help
> to not interrupt idle no?

With the full changelog
>From 89858bf594c020f065b4f415a12d75bee64a7954 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 28 Jan 2016 16:31:43 +0100
Subject: [PATCH] vmstat: make vmstat_update deferrable

0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut
down on idle") made vmstat_shepherd deferrable. vmstat_update itself
is still using standard timer which might interrupt idle task. This
is possible because "mm, vmstat: make quiet_vmstat lighter" removed
cancel_delayed_work from the quiet_vmstat. Change vmstat_work to
use DEFERRABLE_WORK to prevent from pointless wakeups from the idle
context.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/vmstat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index eb30bf45bd55..69537d2be6f6 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1512,7 +1512,7 @@ static void __init start_shepherd_timer(void)
 	int cpu;
 
 	for_each_possible_cpu(cpu)
-		INIT_DELAYED_WORK(per_cpu_ptr(&vmstat_work, cpu),
+		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
 			vmstat_update);
 
 	if (!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL))
-- 
2.7.0.rc3


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle)
  2016-01-28 15:21                                       ` Michal Hocko
@ 2016-01-28 16:40                                         ` Christoph Lameter
  2016-01-28 16:53                                           ` Michal Hocko
  0 siblings, 1 reply; 114+ messages in thread
From: Christoph Lameter @ 2016-01-28 16:40 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Mike Galbraith, Peter Zijlstra, LKML

On Thu, 28 Jan 2016, Michal Hocko wrote:

> +void quiet_vmstat(void)
> +{
> +	if (system_state != SYSTEM_RUNNING)
> +		return;
> +
> +	/*
> +	 * If we are already in hands of the shepherd then there
> +	 * is nothing for us to do here.
> +	 */
> +	if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
> +		return;
> +
> +	if (!need_update(smp_processor_id()))
> +		return;
> +

You can drop the need_update check. refresh_cpu_vm_stats() does the same
checks in a more efficient way. If you keep this the checks will
performed twice.

> +	/*
> +	 * Just refresh counters and do not care about the pending delayed
> +	 * vmstat_update. It doesn't fire that often to matter and canceling
> +	 * it would be too expensive from this path.
> +	 * vmstat_shepherd will take care about that for us.
> +	 */
> +	refresh_cpu_vm_stats(false);
> +}
> +
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle)
  2016-01-28 15:31                                       ` Michal Hocko
  2016-01-28 15:37                                         ` [PATCH] vmstat: make vmstat_update deferrable (was: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter) " Michal Hocko
@ 2016-01-28 16:42                                         ` Christoph Lameter
  1 sibling, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-28 16:42 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Mike Galbraith, Peter Zijlstra, LKML

On Thu, 28 Jan 2016, Michal Hocko wrote:

> Thinking about it some more, making vmstat_update deferrable should help
> to not interrupt idle no?

It will align with the tick then. Yes indeed this should be done.
> ---
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index eb30bf45bd55..69537d2be6f6 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1512,7 +1512,7 @@ static void __init start_shepherd_timer(void)
>  	int cpu;
>
>  	for_each_possible_cpu(cpu)
> -		INIT_DELAYED_WORK(per_cpu_ptr(&vmstat_work, cpu),
> +		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>  			vmstat_update);
>
>  	if (!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL))
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle)
  2016-01-28 15:36                                                       ` Frederic Weisbecker
@ 2016-01-28 16:42                                                         ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-28 16:42 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Michal Hocko, Mike Galbraith, Peter Zijlstra, LKML

On Thu, 28 Jan 2016, Frederic Weisbecker wrote:

> > Frederic, any idea on where to put quiet_vmstat()?
>
> I guess those who want this also want task isolation, right? And Chris integrates
> it in his patchset.

Ok.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH] vmstat: make vmstat_update deferrable (was: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter) path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle)
  2016-01-28 15:37                                         ` [PATCH] vmstat: make vmstat_update deferrable (was: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter) " Michal Hocko
@ 2016-01-28 16:48                                           ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-28 16:48 UTC (permalink / raw)
  To: akpm, Michal Hocko; +Cc: Mike Galbraith, Peter Zijlstra, LKML

On Thu, 28 Jan 2016, Michal Hocko wrote:

> down on idle") made vmstat_shepherd deferrable. vmstat_update itself
> is still using standard timer which might interrupt idle task. This
> is possible because "mm, vmstat: make quiet_vmstat lighter" removed
> cancel_delayed_work from the quiet_vmstat. Change vmstat_work to
> use DEFERRABLE_WORK to prevent from pointless wakeups from the idle
> context.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle)
  2016-01-28 16:40                                         ` Christoph Lameter
@ 2016-01-28 16:53                                           ` Michal Hocko
  2016-01-28 17:05                                             ` Christoph Lameter
  0 siblings, 1 reply; 114+ messages in thread
From: Michal Hocko @ 2016-01-28 16:53 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Mike Galbraith, Peter Zijlstra, LKML

On Thu 28-01-16 10:40:10, Christoph Lameter wrote:
> On Thu, 28 Jan 2016, Michal Hocko wrote:
> 
> > +void quiet_vmstat(void)
> > +{
> > +	if (system_state != SYSTEM_RUNNING)
> > +		return;
> > +
> > +	/*
> > +	 * If we are already in hands of the shepherd then there
> > +	 * is nothing for us to do here.
> > +	 */
> > +	if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
> > +		return;
> > +
> > +	if (!need_update(smp_processor_id()))
> > +		return;
> > +
> 
> You can drop the need_update check. refresh_cpu_vm_stats() does the same
> checks in a more efficient way. If you keep this the checks will
> performed twice.

refresh_cpu_vm_stats does this_cpu_xchg for each counter. Is it really
more effective than memchr_inv?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle)
  2016-01-28 16:53                                           ` Michal Hocko
@ 2016-01-28 17:05                                             ` Christoph Lameter
  0 siblings, 0 replies; 114+ messages in thread
From: Christoph Lameter @ 2016-01-28 17:05 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Mike Galbraith, Peter Zijlstra, LKML

On Thu, 28 Jan 2016, Michal Hocko wrote:

> > You can drop the need_update check. refresh_cpu_vm_stats() does the same
> > checks in a more efficient way. If you keep this the checks will
> > performed twice.
>
> refresh_cpu_vm_stats does this_cpu_xchg for each counter. Is it really
> more effective than memchr_inv?

Ok makes sense.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

end of thread, other threads:[~2016-01-28 17:05 UTC | newest]

Thread overview: 114+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-19  0:33 mm, vmstat: kernel BUG at mm/vmstat.c:1408! Sasha Levin
2015-12-19  0:33 ` Sasha Levin
2015-12-21 13:08 ` Christoph Lameter
2015-12-21 13:08   ` Christoph Lameter
2015-12-21 20:28   ` Sasha Levin
2015-12-21 20:28     ` Sasha Levin
2015-12-21 21:07     ` Sasha Levin
2015-12-21 21:07       ` Sasha Levin
2015-12-21 21:14       ` Christoph Lameter
2015-12-21 21:14         ` Christoph Lameter
2015-12-22 17:21         ` Christoph Lameter
2015-12-22 17:21           ` Christoph Lameter
2015-12-24 20:14         ` Sasha Levin
2015-12-29 17:01           ` Christoph Lameter
2015-12-29 17:01             ` Christoph Lameter
2015-12-29 17:18             ` Christoph Lameter
2015-12-29 17:18               ` Christoph Lameter
2016-01-04 18:05           ` Christoph Lameter
2016-01-04 18:05             ` Christoph Lameter
2016-01-04 18:46             ` Sasha Levin
2016-01-04 18:46               ` Sasha Levin
2016-01-12 11:31   ` Shiraz Hashim
2016-01-12 11:31     ` Shiraz Hashim
2016-01-12 12:23     ` Christoph Lameter
2016-01-12 12:23       ` Christoph Lameter
2016-01-12 12:27       ` Shiraz Hashim
2016-01-12 12:27         ` Shiraz Hashim
2016-01-13 11:36       ` Shiraz Hashim
2016-01-13 11:36         ` Shiraz Hashim
2016-01-13 12:32         ` Shiraz Hashim
2016-01-13 12:32           ` Shiraz Hashim
2016-01-14 21:06         ` Sasha Levin
2016-01-14 21:06           ` Sasha Levin
2016-01-20 14:37 ` Michal Hocko
2016-01-20 14:37   ` Michal Hocko
2016-01-20 14:56   ` Sasha Levin
2016-01-20 14:56     ` Sasha Levin
2016-01-20 15:10     ` Michal Hocko
2016-01-20 15:10       ` Michal Hocko
2016-01-20 15:20       ` Christoph Lameter
2016-01-20 15:20         ` Christoph Lameter
2016-01-20 15:49         ` Sasha Levin
2016-01-20 15:49           ` Sasha Levin
2016-01-20 15:55           ` Christoph Lameter
2016-01-20 15:55             ` Christoph Lameter
2016-01-20 21:28             ` Michal Hocko
2016-01-20 21:28               ` Michal Hocko
2016-01-20 21:57               ` Christoph Lameter
2016-01-20 21:57                 ` Christoph Lameter
2016-01-21  8:24                 ` Michal Hocko
2016-01-21  8:24                   ` Michal Hocko
2016-01-21 15:45                   ` Christoph Lameter
2016-01-21 15:45                     ` Christoph Lameter
2016-01-21 16:51                     ` Michal Hocko
2016-01-21 16:51                       ` Michal Hocko
2016-01-21 17:38                       ` Christoph Lameter
2016-01-21 17:38                         ` Christoph Lameter
2016-01-22 11:00                         ` Shiraz Hashim
2016-01-22 11:00                           ` Shiraz Hashim
2016-01-22 14:04                         ` Michal Hocko
2016-01-22 14:04                           ` Michal Hocko
2016-01-22 16:07                           ` Christoph Lameter
2016-01-22 16:07                             ` Christoph Lameter
2016-01-22 16:12                             ` Michal Hocko
2016-01-22 16:12                               ` Michal Hocko
2016-01-22 16:46                               ` Christoph Lameter
2016-01-22 16:46                                 ` Christoph Lameter
2016-01-22 17:12                                 ` Michal Hocko
2016-01-22 17:12                                   ` Michal Hocko
2016-01-23 16:21                                 ` fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) Mike Galbraith
2016-01-24  0:33                                   ` Christoph Lameter
2016-01-24  2:46                                     ` Mike Galbraith
2016-01-24  3:46                                       ` Christoph Lameter
2016-01-24  5:36                                         ` Mike Galbraith
2016-01-25 17:42                                   ` Michal Hocko
2016-01-25 18:02                                     ` Christoph Lameter
2016-01-25 20:13                                       ` Michal Hocko
2016-01-26 16:25                                         ` Christoph Lameter
2016-01-26 18:31                                           ` Mike Galbraith
2016-01-26 18:34                                             ` Christoph Lameter
2016-01-26 18:45                                               ` Mike Galbraith
2016-01-26 19:20                                                 ` Christoph Lameter
2016-01-27  3:12                                                   ` Mike Galbraith
2016-01-27  4:15                                                     ` Mike Galbraith
2016-01-27 16:28                                                     ` Christoph Lameter
2016-01-28 15:36                                                       ` Frederic Weisbecker
2016-01-28 16:42                                                         ` Christoph Lameter
2016-01-26  2:14                                       ` Mike Galbraith
2016-01-26  2:25                                         ` Mike Galbraith
2016-01-26 16:26                                           ` Christoph Lameter
2016-01-26 17:39                                             ` Mike Galbraith
2016-01-26 18:19                                               ` Christoph Lameter
2016-01-26 16:26                                         ` Christoph Lameter
2016-01-26 17:08                                           ` Mike Galbraith
2016-01-26 18:22                                             ` Christoph Lameter
2016-01-26 19:09                                               ` Mike Galbraith
2016-01-26 19:22                                                 ` Christoph Lameter
2016-01-27 16:48                                   ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) " Michal Hocko
2016-01-27 17:03                                     ` Mike Galbraith
2016-01-27 18:26                                     ` Christoph Lameter
2016-01-28 15:21                                       ` Michal Hocko
2016-01-28 16:40                                         ` Christoph Lameter
2016-01-28 16:53                                           ` Michal Hocko
2016-01-28 17:05                                             ` Christoph Lameter
2016-01-28 15:31                                       ` Michal Hocko
2016-01-28 15:37                                         ` [PATCH] vmstat: make vmstat_update deferrable (was: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter) " Michal Hocko
2016-01-28 16:48                                           ` Christoph Lameter
2016-01-28 16:42                                         ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast " Christoph Lameter
2016-01-24 16:57                                 ` mm, vmstat: kernel BUG at mm/vmstat.c:1408! Linus Torvalds
2016-01-24 16:57                                   ` Linus Torvalds
2016-01-20 15:14   ` Christoph Lameter
2016-01-20 15:14     ` Christoph Lameter
2016-01-20 15:20     ` Michal Hocko
2016-01-20 15:20       ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.