* mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2015-12-19 0:33 ` Sasha Levin 0 siblings, 0 replies; 114+ messages in thread From: Sasha Levin @ 2015-12-19 0:33 UTC (permalink / raw) To: Christoph Lameter, Michal Hocko; +Cc: LKML, linux-mm Hi all, I've started seeing the following in the latest -next kernel. [ 531.127489] kernel BUG at mm/vmstat.c:1408! [ 531.128157] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [ 531.128872] Modules linked in: [ 531.129324] CPU: 6 PID: 407 Comm: kworker/6:1 Not tainted 4.4.0-rc5-next-20151218-sasha-00021-gaba8d84-dirty #2750 [ 531.130939] Workqueue: vmstat vmstat_update [ 531.131741] task: ffff880204070000 ti: ffff880204078000 task.ti: ffff880204078000 [ 531.133189] RIP: vmstat_update (mm/vmstat.c:1408) [ 531.134466] RSP: 0018:ffff88020407fbf8 EFLAGS: 00010293 [ 531.135132] RAX: 0000000000000006 RBX: ffff8800418e2fd8 RCX: 0000000000000000 [ 531.135995] RDX: 0000000000000007 RSI: ffffffff8c0982a0 RDI: ffffffff9b8bd6e4 [ 531.137475] RBP: ffff88020407fc18 R08: 0000000000000000 R09: ffff880204070230 [ 531.138304] R10: ffffffff8c0982a0 R11: 00000000e272b4f2 R12: ffff880204c1bf60 [ 531.139329] R13: ffff880204ab09c8 R14: ffff880204ab09b8 R15: ffff880204ab09b0 [ 531.140261] FS: 0000000000000000(0000) GS:ffff880204c00000(0000) knlGS:0000000000000000 [ 531.141218] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 531.142036] CR2: 00007f039a8c1944 CR3: 000000000ea28000 CR4: 00000000000006a0 [ 531.142752] Stack: [ 531.142963] ffff880204c21000 ffff880204c21000 ffff880204c1bf60 ffff880204ab09c8 [ 531.144095] ffff88020407fd40 ffffffff813a8fea 0000000041b58ab3 ffffffff8e667cdb [ 531.145258] ffff880204ab09f8 ffff880204c1bf68 ffff880204ab09c0 ffff880200000000 [ 531.146475] Call Trace: [ 531.147037] process_one_work (./arch/x86/include/asm/preempt.h:22 kernel/workqueue.c:2045) [ 531.150790] worker_thread (include/linux/compiler.h:218 include/linux/list.h:206 kernel/workqueue.c:2171) [ 531.155176] kthread (kernel/kthread.c:209) [ 531.158941] ret_from_fork (arch/x86/entry/entry_64.S:469) [ 531.160654] Code: 75 1e be 79 00 00 00 48 c7 c7 80 0f 10 8c 89 45 e4 e8 cd 92 cd ff 8b 45 e4 c6 05 e1 c4 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 48 c7 c7 c0 f1 47 90 e8 3d 03 ae 01 48 83 c4 08 5b 41 5c All code ======== 0: 75 1e jne 0x20 2: be 79 00 00 00 mov $0x79,%esi 7: 48 c7 c7 80 0f 10 8c mov $0xffffffff8c100f80,%rdi e: 89 45 e4 mov %eax,-0x1c(%rbp) 11: e8 cd 92 cd ff callq 0xffffffffffcd92e3 16: 8b 45 e4 mov -0x1c(%rbp),%eax 19: c6 05 e1 c4 13 1a 01 movb $0x1,0x1a13c4e1(%rip) # 0x1a13c501 20: 89 c0 mov %eax,%eax 22: f0 48 0f ab 03 lock bts %rax,(%rbx) 27: 72 02 jb 0x2b 29: eb 0e jmp 0x39 2b:* 0f 0b ud2 <-- trapping instruction 2d: 48 c7 c7 c0 f1 47 90 mov $0xffffffff9047f1c0,%rdi 34: e8 3d 03 ae 01 callq 0x1ae0376 39: 48 83 c4 08 add $0x8,%rsp 3d: 5b pop %rbx 3e: 41 5c pop %r12 ... Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: 48 c7 c7 c0 f1 47 90 mov $0xffffffff9047f1c0,%rdi 9: e8 3d 03 ae 01 callq 0x1ae034b e: 48 83 c4 08 add $0x8,%rsp 12: 5b pop %rbx 13: 41 5c pop %r12 ... [ 531.164630] RIP vmstat_update (mm/vmstat.c:1408) [ 531.165523] RSP <ffff88020407fbf8> Thanks, Sasha ^ permalink raw reply [flat|nested] 114+ messages in thread
* mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2015-12-19 0:33 ` Sasha Levin 0 siblings, 0 replies; 114+ messages in thread From: Sasha Levin @ 2015-12-19 0:33 UTC (permalink / raw) To: Christoph Lameter, Michal Hocko; +Cc: LKML, linux-mm Hi all, I've started seeing the following in the latest -next kernel. [ 531.127489] kernel BUG at mm/vmstat.c:1408! [ 531.128157] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [ 531.128872] Modules linked in: [ 531.129324] CPU: 6 PID: 407 Comm: kworker/6:1 Not tainted 4.4.0-rc5-next-20151218-sasha-00021-gaba8d84-dirty #2750 [ 531.130939] Workqueue: vmstat vmstat_update [ 531.131741] task: ffff880204070000 ti: ffff880204078000 task.ti: ffff880204078000 [ 531.133189] RIP: vmstat_update (mm/vmstat.c:1408) [ 531.134466] RSP: 0018:ffff88020407fbf8 EFLAGS: 00010293 [ 531.135132] RAX: 0000000000000006 RBX: ffff8800418e2fd8 RCX: 0000000000000000 [ 531.135995] RDX: 0000000000000007 RSI: ffffffff8c0982a0 RDI: ffffffff9b8bd6e4 [ 531.137475] RBP: ffff88020407fc18 R08: 0000000000000000 R09: ffff880204070230 [ 531.138304] R10: ffffffff8c0982a0 R11: 00000000e272b4f2 R12: ffff880204c1bf60 [ 531.139329] R13: ffff880204ab09c8 R14: ffff880204ab09b8 R15: ffff880204ab09b0 [ 531.140261] FS: 0000000000000000(0000) GS:ffff880204c00000(0000) knlGS:0000000000000000 [ 531.141218] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 531.142036] CR2: 00007f039a8c1944 CR3: 000000000ea28000 CR4: 00000000000006a0 [ 531.142752] Stack: [ 531.142963] ffff880204c21000 ffff880204c21000 ffff880204c1bf60 ffff880204ab09c8 [ 531.144095] ffff88020407fd40 ffffffff813a8fea 0000000041b58ab3 ffffffff8e667cdb [ 531.145258] ffff880204ab09f8 ffff880204c1bf68 ffff880204ab09c0 ffff880200000000 [ 531.146475] Call Trace: [ 531.147037] process_one_work (./arch/x86/include/asm/preempt.h:22 kernel/workqueue.c:2045) [ 531.150790] worker_thread (include/linux/compiler.h:218 include/linux/list.h:206 kernel/workqueue.c:2171) [ 531.155176] kthread (kernel/kthread.c:209) [ 531.158941] ret_from_fork (arch/x86/entry/entry_64.S:469) [ 531.160654] Code: 75 1e be 79 00 00 00 48 c7 c7 80 0f 10 8c 89 45 e4 e8 cd 92 cd ff 8b 45 e4 c6 05 e1 c4 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 48 c7 c7 c0 f1 47 90 e8 3d 03 ae 01 48 83 c4 08 5b 41 5c All code ======== 0: 75 1e jne 0x20 2: be 79 00 00 00 mov $0x79,%esi 7: 48 c7 c7 80 0f 10 8c mov $0xffffffff8c100f80,%rdi e: 89 45 e4 mov %eax,-0x1c(%rbp) 11: e8 cd 92 cd ff callq 0xffffffffffcd92e3 16: 8b 45 e4 mov -0x1c(%rbp),%eax 19: c6 05 e1 c4 13 1a 01 movb $0x1,0x1a13c4e1(%rip) # 0x1a13c501 20: 89 c0 mov %eax,%eax 22: f0 48 0f ab 03 lock bts %rax,(%rbx) 27: 72 02 jb 0x2b 29: eb 0e jmp 0x39 2b:* 0f 0b ud2 <-- trapping instruction 2d: 48 c7 c7 c0 f1 47 90 mov $0xffffffff9047f1c0,%rdi 34: e8 3d 03 ae 01 callq 0x1ae0376 39: 48 83 c4 08 add $0x8,%rsp 3d: 5b pop %rbx 3e: 41 5c pop %r12 ... Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: 48 c7 c7 c0 f1 47 90 mov $0xffffffff9047f1c0,%rdi 9: e8 3d 03 ae 01 callq 0x1ae034b e: 48 83 c4 08 add $0x8,%rsp 12: 5b pop %rbx 13: 41 5c pop %r12 ... [ 531.164630] RIP vmstat_update (mm/vmstat.c:1408) [ 531.165523] RSP <ffff88020407fbf8> Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2015-12-19 0:33 ` Sasha Levin @ 2015-12-21 13:08 ` Christoph Lameter -1 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2015-12-21 13:08 UTC (permalink / raw) To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm On Fri, 18 Dec 2015, Sasha Levin wrote: > [ 531.164630] RIP vmstat_update (mm/vmstat.c:1408) Hmmm.. Yes we need to fold the diffs first before disabling the timer otherwise the shepherd task may intervene. Does this patch fix it? Subject: quiet_vmstat: Avoid race with shepherd by folding counters first We need to fold the counters first otherwise the shepherd task may remotely reactivate the vmstat worker. This also avoids the strange loop. Nothing can really increase the counters at that point since we are in the cpu idle loop. So folding the counters once is enough. Cancelling work that does not exist is fine too so just avoid the branches completely. Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/mm/vmstat.c =================================================================== --- linux.orig/mm/vmstat.c +++ linux/mm/vmstat.c @@ -1419,11 +1419,9 @@ void quiet_vmstat(void) if (system_state != SYSTEM_RUNNING) return; - do { - if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) - cancel_delayed_work(this_cpu_ptr(&vmstat_work)); - - } while (refresh_cpu_vm_stats(false)); + refresh_cpu_vm_stats(false); + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); } /* ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2015-12-21 13:08 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2015-12-21 13:08 UTC (permalink / raw) To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm On Fri, 18 Dec 2015, Sasha Levin wrote: > [ 531.164630] RIP vmstat_update (mm/vmstat.c:1408) Hmmm.. Yes we need to fold the diffs first before disabling the timer otherwise the shepherd task may intervene. Does this patch fix it? Subject: quiet_vmstat: Avoid race with shepherd by folding counters first We need to fold the counters first otherwise the shepherd task may remotely reactivate the vmstat worker. This also avoids the strange loop. Nothing can really increase the counters at that point since we are in the cpu idle loop. So folding the counters once is enough. Cancelling work that does not exist is fine too so just avoid the branches completely. Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/mm/vmstat.c =================================================================== --- linux.orig/mm/vmstat.c +++ linux/mm/vmstat.c @@ -1419,11 +1419,9 @@ void quiet_vmstat(void) if (system_state != SYSTEM_RUNNING) return; - do { - if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) - cancel_delayed_work(this_cpu_ptr(&vmstat_work)); - - } while (refresh_cpu_vm_stats(false)); + refresh_cpu_vm_stats(false); + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); } /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2015-12-21 13:08 ` Christoph Lameter @ 2015-12-21 20:28 ` Sasha Levin -1 siblings, 0 replies; 114+ messages in thread From: Sasha Levin @ 2015-12-21 20:28 UTC (permalink / raw) To: Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm On 12/21/2015 08:08 AM, Christoph Lameter wrote: > On Fri, 18 Dec 2015, Sasha Levin wrote: > >> > [ 531.164630] RIP vmstat_update (mm/vmstat.c:1408) > Hmmm.. Yes we need to fold the diffs first before disabling the timer > otherwise the shepherd task may intervene. > > Does this patch fix it? It didn't. With the patch I'm still seeing: [ 1997.674112] kernel BUG at mm/vmstat.c:1408! [ 1997.674930] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [ 1997.676252] Modules linked in: [ 1997.676880] CPU: 4 PID: 14713 Comm: kworker/4:0 Not tainted 4.4.0-rc5-next-20151221-sasha-00020-g840272e-dirty #2753 [ 1997.679262] Workqueue: vmstat vmstat_update [ 1997.680109] task: ffff88015bca0000 ti: ffff88015bcb8000 task.ti: ffff88015bcb8000 [ 1997.681279] RIP: 0010:[<ffffffffa2683c48>] [<ffffffffa2683c48>] vmstat_update+0x178/0x1a0 [ 1997.682810] RSP: 0018:ffff88015bcbfc00 EFLAGS: 00010297 [ 1997.683611] RAX: 0000000000000004 RBX: ffff8803d2801a18 RCX: 0000000000000000 [ 1997.684689] RDX: 0000000000000007 RSI: ffffffffad098220 RDI: ffffffffbc8be724 [ 1997.685750] RBP: ffff88015bcbfc20 R08: 0000000000000000 R09: ffff88015bca0230 [ 1997.686812] R10: ffffffffad098220 R11: ffff880180a1be78 R12: ffff880180a1be60 [ 1997.688087] R13: ffff880157b27908 R14: ffff880180a21000 R15: ffff880157b27900 [ 1997.689120] FS: 0000000000000000(0000) GS:ffff880180a00000(0000) knlGS:0000000000000000 [ 1997.690290] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1997.691141] CR2: 00007f05e5690000 CR3: 0000000158699000 CR4: 00000000000006a0 [ 1997.692189] Stack: [ 1997.692496] ffff880180a21000 ffff880157b27918 ffff880180a1be60 ffff880157b27908 [ 1997.693689] ffff88015bcbfd40 ffffffffa23aa40f ffffffffaf66686b ffff880157b27948 [ 1997.694851] ffff880180a1be68 ffff880100000000 ffff880157b27910 ffff880157b27920 [ 1997.696013] Call Trace: [ 1997.696542] [<ffffffffa23aa40f>] process_one_work+0xacf/0x12a0 [ 1997.697516] [<ffffffffa23a9940>] ? cancel_delayed_work_sync+0x10/0x10 [ 1997.698889] [<fffffffface0df27>] ? __schedule+0x1127/0x14c0 [ 1997.699738] [<ffffffffa23e11bd>] ? get_parent_ip+0xd/0x40 [ 1997.700547] [<ffffffffa23e12d9>] ? preempt_count_add+0xe9/0x140 [ 1997.701426] [<ffffffffa23ab898>] worker_thread+0xcb8/0x1090 [ 1997.702259] [<ffffffffa23d2ad0>] ? load_mm_ldt+0x1f0/0x1f0 [ 1997.703084] [<ffffffffa23d5b53>] ? update_rq_clock+0x123/0x2e0 [ 1997.703962] [<ffffffffa23aabe0>] ? process_one_work+0x12a0/0x12a0 [ 1997.704896] [<ffffffffa23aabe0>] ? process_one_work+0x12a0/0x12a0 [ 1997.705804] [<ffffffffa23bf8ce>] kthread+0x31e/0x330 [ 1997.706551] [<ffffffffa23d3195>] ? finish_task_switch+0x6c5/0x970 [ 1997.707481] [<ffffffffa23bf5b0>] ? kthread_worker_fn+0x680/0x680 [ 1997.708374] [<ffffffffa23bf5b0>] ? kthread_worker_fn+0x680/0x680 [ 1997.709269] [<fffffffface1a50f>] ret_from_fork+0x3f/0x70 [ 1997.710067] [<ffffffffa23bf5b0>] ? kthread_worker_fn+0x680/0x680 [ 1997.710964] Code: 75 1e be 79 00 00 00 48 c7 c7 40 16 10 ad 89 45 e4 e8 2d 88 cd ff 8b 45 e4 c6 05 a0 a9 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 48 c7 c7 c0 21 48 b1 e8 45 d6 ad 01 48 83 c4 08 5b 41 5c [ 1997.714961] RIP [<ffffffffa2683c48>] vmstat_update+0x178/0x1a0 [ 1997.715852] RSP <ffff88015bcbfc00> Thanks, Sasha ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2015-12-21 20:28 ` Sasha Levin 0 siblings, 0 replies; 114+ messages in thread From: Sasha Levin @ 2015-12-21 20:28 UTC (permalink / raw) To: Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm On 12/21/2015 08:08 AM, Christoph Lameter wrote: > On Fri, 18 Dec 2015, Sasha Levin wrote: > >> > [ 531.164630] RIP vmstat_update (mm/vmstat.c:1408) > Hmmm.. Yes we need to fold the diffs first before disabling the timer > otherwise the shepherd task may intervene. > > Does this patch fix it? It didn't. With the patch I'm still seeing: [ 1997.674112] kernel BUG at mm/vmstat.c:1408! [ 1997.674930] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [ 1997.676252] Modules linked in: [ 1997.676880] CPU: 4 PID: 14713 Comm: kworker/4:0 Not tainted 4.4.0-rc5-next-20151221-sasha-00020-g840272e-dirty #2753 [ 1997.679262] Workqueue: vmstat vmstat_update [ 1997.680109] task: ffff88015bca0000 ti: ffff88015bcb8000 task.ti: ffff88015bcb8000 [ 1997.681279] RIP: 0010:[<ffffffffa2683c48>] [<ffffffffa2683c48>] vmstat_update+0x178/0x1a0 [ 1997.682810] RSP: 0018:ffff88015bcbfc00 EFLAGS: 00010297 [ 1997.683611] RAX: 0000000000000004 RBX: ffff8803d2801a18 RCX: 0000000000000000 [ 1997.684689] RDX: 0000000000000007 RSI: ffffffffad098220 RDI: ffffffffbc8be724 [ 1997.685750] RBP: ffff88015bcbfc20 R08: 0000000000000000 R09: ffff88015bca0230 [ 1997.686812] R10: ffffffffad098220 R11: ffff880180a1be78 R12: ffff880180a1be60 [ 1997.688087] R13: ffff880157b27908 R14: ffff880180a21000 R15: ffff880157b27900 [ 1997.689120] FS: 0000000000000000(0000) GS:ffff880180a00000(0000) knlGS:0000000000000000 [ 1997.690290] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1997.691141] CR2: 00007f05e5690000 CR3: 0000000158699000 CR4: 00000000000006a0 [ 1997.692189] Stack: [ 1997.692496] ffff880180a21000 ffff880157b27918 ffff880180a1be60 ffff880157b27908 [ 1997.693689] ffff88015bcbfd40 ffffffffa23aa40f ffffffffaf66686b ffff880157b27948 [ 1997.694851] ffff880180a1be68 ffff880100000000 ffff880157b27910 ffff880157b27920 [ 1997.696013] Call Trace: [ 1997.696542] [<ffffffffa23aa40f>] process_one_work+0xacf/0x12a0 [ 1997.697516] [<ffffffffa23a9940>] ? cancel_delayed_work_sync+0x10/0x10 [ 1997.698889] [<fffffffface0df27>] ? __schedule+0x1127/0x14c0 [ 1997.699738] [<ffffffffa23e11bd>] ? get_parent_ip+0xd/0x40 [ 1997.700547] [<ffffffffa23e12d9>] ? preempt_count_add+0xe9/0x140 [ 1997.701426] [<ffffffffa23ab898>] worker_thread+0xcb8/0x1090 [ 1997.702259] [<ffffffffa23d2ad0>] ? load_mm_ldt+0x1f0/0x1f0 [ 1997.703084] [<ffffffffa23d5b53>] ? update_rq_clock+0x123/0x2e0 [ 1997.703962] [<ffffffffa23aabe0>] ? process_one_work+0x12a0/0x12a0 [ 1997.704896] [<ffffffffa23aabe0>] ? process_one_work+0x12a0/0x12a0 [ 1997.705804] [<ffffffffa23bf8ce>] kthread+0x31e/0x330 [ 1997.706551] [<ffffffffa23d3195>] ? finish_task_switch+0x6c5/0x970 [ 1997.707481] [<ffffffffa23bf5b0>] ? kthread_worker_fn+0x680/0x680 [ 1997.708374] [<ffffffffa23bf5b0>] ? kthread_worker_fn+0x680/0x680 [ 1997.709269] [<fffffffface1a50f>] ret_from_fork+0x3f/0x70 [ 1997.710067] [<ffffffffa23bf5b0>] ? kthread_worker_fn+0x680/0x680 [ 1997.710964] Code: 75 1e be 79 00 00 00 48 c7 c7 40 16 10 ad 89 45 e4 e8 2d 88 cd ff 8b 45 e4 c6 05 a0 a9 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 48 c7 c7 c0 21 48 b1 e8 45 d6 ad 01 48 83 c4 08 5b 41 5c [ 1997.714961] RIP [<ffffffffa2683c48>] vmstat_update+0x178/0x1a0 [ 1997.715852] RSP <ffff88015bcbfc00> Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2015-12-21 20:28 ` Sasha Levin @ 2015-12-21 21:07 ` Sasha Levin -1 siblings, 0 replies; 114+ messages in thread From: Sasha Levin @ 2015-12-21 21:07 UTC (permalink / raw) To: Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm On 12/21/2015 03:28 PM, Sasha Levin wrote: > On 12/21/2015 08:08 AM, Christoph Lameter wrote: >> > On Fri, 18 Dec 2015, Sasha Levin wrote: >> > >>>> >> > [ 531.164630] RIP vmstat_update (mm/vmstat.c:1408) >> > Hmmm.. Yes we need to fold the diffs first before disabling the timer >> > otherwise the shepherd task may intervene. >> > >> > Does this patch fix it? > It didn't. With the patch I'm still seeing: I've also noticed a new warning from the workqueue code which my scripts didn't pick up before: [ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 stuck for 54s! [ 3462.522041] workqueue vmstat: flags=0xc [ 3462.527795] pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256 [ 3462.554836] pending: vmstat_update Thanks, Sasha ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2015-12-21 21:07 ` Sasha Levin 0 siblings, 0 replies; 114+ messages in thread From: Sasha Levin @ 2015-12-21 21:07 UTC (permalink / raw) To: Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm On 12/21/2015 03:28 PM, Sasha Levin wrote: > On 12/21/2015 08:08 AM, Christoph Lameter wrote: >> > On Fri, 18 Dec 2015, Sasha Levin wrote: >> > >>>> >> > [ 531.164630] RIP vmstat_update (mm/vmstat.c:1408) >> > Hmmm.. Yes we need to fold the diffs first before disabling the timer >> > otherwise the shepherd task may intervene. >> > >> > Does this patch fix it? > It didn't. With the patch I'm still seeing: I've also noticed a new warning from the workqueue code which my scripts didn't pick up before: [ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 stuck for 54s! [ 3462.522041] workqueue vmstat: flags=0xc [ 3462.527795] pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256 [ 3462.554836] pending: vmstat_update Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2015-12-21 21:07 ` Sasha Levin @ 2015-12-21 21:14 ` Christoph Lameter -1 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2015-12-21 21:14 UTC (permalink / raw) To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm On Mon, 21 Dec 2015, Sasha Levin wrote: > I've also noticed a new warning from the workqueue code which my scripts > didn't pick up before: > > [ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 stuck for 54s! > [ 3462.522041] workqueue vmstat: flags=0xc > [ 3462.527795] pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256 > [ 3462.554836] pending: vmstat_update Does that mean that vmstat_update locks up or something that schedules it? Also what workload triggers the BUG()? ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2015-12-21 21:14 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2015-12-21 21:14 UTC (permalink / raw) To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm On Mon, 21 Dec 2015, Sasha Levin wrote: > I've also noticed a new warning from the workqueue code which my scripts > didn't pick up before: > > [ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 stuck for 54s! > [ 3462.522041] workqueue vmstat: flags=0xc > [ 3462.527795] pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256 > [ 3462.554836] pending: vmstat_update Does that mean that vmstat_update locks up or something that schedules it? Also what workload triggers the BUG()? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2015-12-21 21:14 ` Christoph Lameter @ 2015-12-22 17:21 ` Christoph Lameter -1 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2015-12-22 17:21 UTC (permalink / raw) To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm Ran this here but no errors. Need config etc to reproduce. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2015-12-22 17:21 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2015-12-22 17:21 UTC (permalink / raw) To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm Ran this here but no errors. Need config etc to reproduce. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2015-12-21 21:14 ` Christoph Lameter (?) (?) @ 2015-12-24 20:14 ` Sasha Levin 2015-12-29 17:01 ` Christoph Lameter 2016-01-04 18:05 ` Christoph Lameter -1 siblings, 2 replies; 114+ messages in thread From: Sasha Levin @ 2015-12-24 20:14 UTC (permalink / raw) To: Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm [-- Attachment #1: Type: text/plain, Size: 741 bytes --] On 12/21/2015 04:14 PM, Christoph Lameter wrote: > On Mon, 21 Dec 2015, Sasha Levin wrote: > >> > I've also noticed a new warning from the workqueue code which my scripts >> > didn't pick up before: >> > >> > [ 3462.380681] BUG: workqueue lockup - pool cpus=2 node=2 flags=0x4 nice=0 stuck for 54s! >> > [ 3462.522041] workqueue vmstat: flags=0xc >> > [ 3462.527795] pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256 >> > [ 3462.554836] pending: vmstat_update > Does that mean that vmstat_update locks up or something that schedules it? I think it means that vmstat_update didn't finish running (working). > Also what workload triggers the BUG()? Fuzzing with trinity inside a KVM guest. I've attached my config. Thanks, Sasha [-- Attachment #2: config-sasha.gz --] [-- Type: application/gzip, Size: 43528 bytes --] ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2015-12-24 20:14 ` Sasha Levin @ 2015-12-29 17:01 ` Christoph Lameter 2016-01-04 18:05 ` Christoph Lameter 1 sibling, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2015-12-29 17:01 UTC (permalink / raw) To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm On Thu, 24 Dec 2015, Sasha Levin wrote: > >> > [ 3462.527795] pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256 > >> > [ 3462.554836] pending: vmstat_update > > Does that mean that vmstat_update locks up or something that schedules it? > > I think it means that vmstat_update didn't finish running (working). There is nothing in there that blocks. > > Also what workload triggers the BUG()? > > Fuzzing with trinity inside a KVM guest. I've attached my config. Ok will have a look. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2015-12-29 17:01 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2015-12-29 17:01 UTC (permalink / raw) To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm On Thu, 24 Dec 2015, Sasha Levin wrote: > >> > [ 3462.527795] pwq 4: cpus=2 node=2 flags=0x0 nice=0 active=1/256 > >> > [ 3462.554836] pending: vmstat_update > > Does that mean that vmstat_update locks up or something that schedules it? > > I think it means that vmstat_update didn't finish running (working). There is nothing in there that blocks. > > Also what workload triggers the BUG()? > > Fuzzing with trinity inside a KVM guest. I've attached my config. Ok will have a look. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2015-12-29 17:01 ` Christoph Lameter @ 2015-12-29 17:18 ` Christoph Lameter -1 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2015-12-29 17:18 UTC (permalink / raw) To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm Builds with your kernel config fail with CHK include/generated/bounds.h CHK include/generated/asm-offsets.h CALL scripts/checksyscalls.sh CHK include/generated/compile.h make[1]: *** No rule to make target 'signing_key.pem', needed by 'certs/signing_key.x509'. Stop. Makefile:967: recipe for target 'certs' failed make: *** [certs] Error 2 make: *** Waiting for unfinished jobs.... CHK kernel/config_data.h ??? ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2015-12-29 17:18 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2015-12-29 17:18 UTC (permalink / raw) To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm Builds with your kernel config fail with CHK include/generated/bounds.h CHK include/generated/asm-offsets.h CALL scripts/checksyscalls.sh CHK include/generated/compile.h make[1]: *** No rule to make target 'signing_key.pem', needed by 'certs/signing_key.x509'. Stop. Makefile:967: recipe for target 'certs' failed make: *** [certs] Error 2 make: *** Waiting for unfinished jobs.... CHK kernel/config_data.h ??? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2015-12-24 20:14 ` Sasha Levin @ 2016-01-04 18:05 ` Christoph Lameter 2016-01-04 18:05 ` Christoph Lameter 1 sibling, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-04 18:05 UTC (permalink / raw) To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm On Thu, 24 Dec 2015, Sasha Levin wrote: > > Also what workload triggers the BUG()? > > Fuzzing with trinity inside a KVM guest. I've attached my config. Ok build and bootup works fine after fix from Tetsuo to config. Does not like my initrd it seems. Is there a root with the tools available somehow? ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-04 18:05 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-04 18:05 UTC (permalink / raw) To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm On Thu, 24 Dec 2015, Sasha Levin wrote: > > Also what workload triggers the BUG()? > > Fuzzing with trinity inside a KVM guest. I've attached my config. Ok build and bootup works fine after fix from Tetsuo to config. Does not like my initrd it seems. Is there a root with the tools available somehow? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-04 18:05 ` Christoph Lameter @ 2016-01-04 18:46 ` Sasha Levin -1 siblings, 0 replies; 114+ messages in thread From: Sasha Levin @ 2016-01-04 18:46 UTC (permalink / raw) To: Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm On 01/04/2016 01:05 PM, Christoph Lameter wrote: > On Thu, 24 Dec 2015, Sasha Levin wrote: > >>> Also what workload triggers the BUG()? >> >> Fuzzing with trinity inside a KVM guest. I've attached my config. > > Ok build and bootup works fine after fix from Tetsuo to config. Does not > like my initrd it seems. Is there a root with the tools available somehow? Will is hosting a stand-alone version here: git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git Thanks, Sasha ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-04 18:46 ` Sasha Levin 0 siblings, 0 replies; 114+ messages in thread From: Sasha Levin @ 2016-01-04 18:46 UTC (permalink / raw) To: Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm On 01/04/2016 01:05 PM, Christoph Lameter wrote: > On Thu, 24 Dec 2015, Sasha Levin wrote: > >>> Also what workload triggers the BUG()? >> >> Fuzzing with trinity inside a KVM guest. I've attached my config. > > Ok build and bootup works fine after fix from Tetsuo to config. Does not > like my initrd it seems. Is there a root with the tools available somehow? Will is hosting a stand-alone version here: git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2015-12-21 13:08 ` Christoph Lameter @ 2016-01-12 11:31 ` Shiraz Hashim -1 siblings, 0 replies; 114+ messages in thread From: Shiraz Hashim @ 2016-01-12 11:31 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm On Mon, Dec 21, 2015 at 6:38 PM, Christoph Lameter <cl@linux.com> wrote: > Index: linux/mm/vmstat.c > =================================================================== > --- linux.orig/mm/vmstat.c > +++ linux/mm/vmstat.c > @@ -1419,11 +1419,9 @@ void quiet_vmstat(void) > if (system_state != SYSTEM_RUNNING) > return; > > - do { > - if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) > - cancel_delayed_work(this_cpu_ptr(&vmstat_work)); > - > - } while (refresh_cpu_vm_stats(false)); > + refresh_cpu_vm_stats(false); > + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); > shouldn't this be cancel_delayed_work_sync ? > + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); > else ongoing vmstat_update can again encounter cpu_stat_off set ? > } > -- regards Shiraz Hashim ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-12 11:31 ` Shiraz Hashim 0 siblings, 0 replies; 114+ messages in thread From: Shiraz Hashim @ 2016-01-12 11:31 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm On Mon, Dec 21, 2015 at 6:38 PM, Christoph Lameter <cl@linux.com> wrote: > Index: linux/mm/vmstat.c > =================================================================== > --- linux.orig/mm/vmstat.c > +++ linux/mm/vmstat.c > @@ -1419,11 +1419,9 @@ void quiet_vmstat(void) > if (system_state != SYSTEM_RUNNING) > return; > > - do { > - if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) > - cancel_delayed_work(this_cpu_ptr(&vmstat_work)); > - > - } while (refresh_cpu_vm_stats(false)); > + refresh_cpu_vm_stats(false); > + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); > shouldn't this be cancel_delayed_work_sync ? > + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); > else ongoing vmstat_update can again encounter cpu_stat_off set ? > } > -- regards Shiraz Hashim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-12 11:31 ` Shiraz Hashim @ 2016-01-12 12:23 ` Christoph Lameter -1 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-12 12:23 UTC (permalink / raw) To: Shiraz Hashim; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm On Tue, 12 Jan 2016, Shiraz Hashim wrote: > > + refresh_cpu_vm_stats(false); > > + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); > > > > shouldn't this be cancel_delayed_work_sync ? Hmmm... This is executed with preemption off and the work is on the same cpu. If it would be able to run concurrently then we would need this. Ok but it could run from the timer interrupt if that is still on and occuring shortly before we go idle. Guess this needs to be similar to the code we execute on cpu down in the vmstat notifiers (see vmstat_cpuup_callback). Does this fix it? I have not been able to reproduce the issue so far. Patch against -next. Subject: vmstat: Use delayed work_sync and avoid loop. Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/mm/vmstat.c =================================================================== --- linux.orig/mm/vmstat.c +++ linux/mm/vmstat.c @@ -1419,11 +1419,9 @@ void quiet_vmstat(void) if (system_state != SYSTEM_RUNNING) return; - do { - if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) - cancel_delayed_work(this_cpu_ptr(&vmstat_work)); - - } while (refresh_cpu_vm_stats(false)); + refresh_cpu_vm_stats(false); + cancel_delayed_work_sync(this_cpu_ptr(&vmstat_work)); + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); } /* ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-12 12:23 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-12 12:23 UTC (permalink / raw) To: Shiraz Hashim; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm On Tue, 12 Jan 2016, Shiraz Hashim wrote: > > + refresh_cpu_vm_stats(false); > > + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); > > > > shouldn't this be cancel_delayed_work_sync ? Hmmm... This is executed with preemption off and the work is on the same cpu. If it would be able to run concurrently then we would need this. Ok but it could run from the timer interrupt if that is still on and occuring shortly before we go idle. Guess this needs to be similar to the code we execute on cpu down in the vmstat notifiers (see vmstat_cpuup_callback). Does this fix it? I have not been able to reproduce the issue so far. Patch against -next. Subject: vmstat: Use delayed work_sync and avoid loop. Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/mm/vmstat.c =================================================================== --- linux.orig/mm/vmstat.c +++ linux/mm/vmstat.c @@ -1419,11 +1419,9 @@ void quiet_vmstat(void) if (system_state != SYSTEM_RUNNING) return; - do { - if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) - cancel_delayed_work(this_cpu_ptr(&vmstat_work)); - - } while (refresh_cpu_vm_stats(false)); + refresh_cpu_vm_stats(false); + cancel_delayed_work_sync(this_cpu_ptr(&vmstat_work)); + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); } /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-12 12:23 ` Christoph Lameter @ 2016-01-12 12:27 ` Shiraz Hashim -1 siblings, 0 replies; 114+ messages in thread From: Shiraz Hashim @ 2016-01-12 12:27 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm On Tue, Jan 12, 2016 at 5:53 PM, Christoph Lameter <cl@linux.com> wrote: > Does this fix it? I have not been able to reproduce the issue so far. I too am not able to reproduce. Was just thinking what else can go wrong in Sasha's setup. -- regards Shiraz Hashim ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-12 12:27 ` Shiraz Hashim 0 siblings, 0 replies; 114+ messages in thread From: Shiraz Hashim @ 2016-01-12 12:27 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm On Tue, Jan 12, 2016 at 5:53 PM, Christoph Lameter <cl@linux.com> wrote: > Does this fix it? I have not been able to reproduce the issue so far. I too am not able to reproduce. Was just thinking what else can go wrong in Sasha's setup. -- regards Shiraz Hashim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-12 12:23 ` Christoph Lameter @ 2016-01-13 11:36 ` Shiraz Hashim -1 siblings, 0 replies; 114+ messages in thread From: Shiraz Hashim @ 2016-01-13 11:36 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm Hi Sasha, On Tue, Jan 12, 2016 at 5:53 PM, Christoph Lameter <cl@linux.com> wrote: > On Tue, 12 Jan 2016, Shiraz Hashim wrote: > >> > + refresh_cpu_vm_stats(false); >> > + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); >> > >> >> shouldn't this be cancel_delayed_work_sync ? > > Hmmm... This is executed with preemption off and the work is on the same > cpu. If it would be able to run concurrently then we would need this. > > Ok but it could run from the timer interrupt if that is still on and > occuring shortly before we go idle. Guess this needs to be similar to > the code we execute on cpu down in the vmstat notifiers (see > vmstat_cpuup_callback). > > Does this fix it? I have not been able to reproduce the issue so far. > > Patch against -next. > > > > Subject: vmstat: Use delayed work_sync and avoid loop. > > Signed-off-by: Christoph Lameter <cl@linux.com> > > Index: linux/mm/vmstat.c > =================================================================== > --- linux.orig/mm/vmstat.c > +++ linux/mm/vmstat.c > @@ -1419,11 +1419,9 @@ void quiet_vmstat(void) > if (system_state != SYSTEM_RUNNING) > return; > > - do { > - if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) > - cancel_delayed_work(this_cpu_ptr(&vmstat_work)); > - > - } while (refresh_cpu_vm_stats(false)); > + refresh_cpu_vm_stats(false); > + cancel_delayed_work_sync(this_cpu_ptr(&vmstat_work)); > + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); > } > > /* Can you please give it a try, seems it is reproducing easily at your end. -- regards Shiraz Hashim ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-13 11:36 ` Shiraz Hashim 0 siblings, 0 replies; 114+ messages in thread From: Shiraz Hashim @ 2016-01-13 11:36 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm Hi Sasha, On Tue, Jan 12, 2016 at 5:53 PM, Christoph Lameter <cl@linux.com> wrote: > On Tue, 12 Jan 2016, Shiraz Hashim wrote: > >> > + refresh_cpu_vm_stats(false); >> > + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); >> > >> >> shouldn't this be cancel_delayed_work_sync ? > > Hmmm... This is executed with preemption off and the work is on the same > cpu. If it would be able to run concurrently then we would need this. > > Ok but it could run from the timer interrupt if that is still on and > occuring shortly before we go idle. Guess this needs to be similar to > the code we execute on cpu down in the vmstat notifiers (see > vmstat_cpuup_callback). > > Does this fix it? I have not been able to reproduce the issue so far. > > Patch against -next. > > > > Subject: vmstat: Use delayed work_sync and avoid loop. > > Signed-off-by: Christoph Lameter <cl@linux.com> > > Index: linux/mm/vmstat.c > =================================================================== > --- linux.orig/mm/vmstat.c > +++ linux/mm/vmstat.c > @@ -1419,11 +1419,9 @@ void quiet_vmstat(void) > if (system_state != SYSTEM_RUNNING) > return; > > - do { > - if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) > - cancel_delayed_work(this_cpu_ptr(&vmstat_work)); > - > - } while (refresh_cpu_vm_stats(false)); > + refresh_cpu_vm_stats(false); > + cancel_delayed_work_sync(this_cpu_ptr(&vmstat_work)); > + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); > } > > /* Can you please give it a try, seems it is reproducing easily at your end. -- regards Shiraz Hashim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-13 11:36 ` Shiraz Hashim @ 2016-01-13 12:32 ` Shiraz Hashim -1 siblings, 0 replies; 114+ messages in thread From: Shiraz Hashim @ 2016-01-13 12:32 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm On Wed, Jan 13, 2016 at 5:06 PM, Shiraz Hashim <shiraz.linux.kernel@gmail.com> wrote: > > Can you please give it a try, seems it is reproducing easily at your end. > Ahh!, it cannot be done as it is called with preemption disabled. [ 8.131226] Process swapper/4 (pid: 0, stack limit = 0xffffffc06fc18058) [ 8.137908] Call trace: [ 8.140344] [<ffffffc0000c19cc>] __might_sleep+0x15c/0x16c [ 8.145812] [<ffffffc0000b3d64>] flush_work+0x3c/0x190 [ 8.150933] [<ffffffc0000b4450>] __cancel_work_timer+0x138/0x1e4 [ 8.156922] [<ffffffc0000b45a8>] cancel_delayed_work_sync+0xc/0x18 [ 8.163087] [<ffffffc0001782dc>] quiet_vmstat+0x3c/0x60 [ 8.168294] [<ffffffc0000dd2f0>] cpu_startup_entry+0x2c/0x330 [ 8.174024] [<ffffffc000090b24>] secondary_start_kernel+0x124/0x134 -- regards Shiraz Hashim ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-13 12:32 ` Shiraz Hashim 0 siblings, 0 replies; 114+ messages in thread From: Shiraz Hashim @ 2016-01-13 12:32 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, Michal Hocko, LKML, linux-mm On Wed, Jan 13, 2016 at 5:06 PM, Shiraz Hashim <shiraz.linux.kernel@gmail.com> wrote: > > Can you please give it a try, seems it is reproducing easily at your end. > Ahh!, it cannot be done as it is called with preemption disabled. [ 8.131226] Process swapper/4 (pid: 0, stack limit = 0xffffffc06fc18058) [ 8.137908] Call trace: [ 8.140344] [<ffffffc0000c19cc>] __might_sleep+0x15c/0x16c [ 8.145812] [<ffffffc0000b3d64>] flush_work+0x3c/0x190 [ 8.150933] [<ffffffc0000b4450>] __cancel_work_timer+0x138/0x1e4 [ 8.156922] [<ffffffc0000b45a8>] cancel_delayed_work_sync+0xc/0x18 [ 8.163087] [<ffffffc0001782dc>] quiet_vmstat+0x3c/0x60 [ 8.168294] [<ffffffc0000dd2f0>] cpu_startup_entry+0x2c/0x330 [ 8.174024] [<ffffffc000090b24>] secondary_start_kernel+0x124/0x134 -- regards Shiraz Hashim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-13 11:36 ` Shiraz Hashim @ 2016-01-14 21:06 ` Sasha Levin -1 siblings, 0 replies; 114+ messages in thread From: Sasha Levin @ 2016-01-14 21:06 UTC (permalink / raw) To: Shiraz Hashim, Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm On 01/13/2016 06:36 AM, Shiraz Hashim wrote: > Hi Sasha, > > On Tue, Jan 12, 2016 at 5:53 PM, Christoph Lameter <cl@linux.com> wrote: >> On Tue, 12 Jan 2016, Shiraz Hashim wrote: >> >>>> + refresh_cpu_vm_stats(false); >>>> + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); >>>> >>> >>> shouldn't this be cancel_delayed_work_sync ? >> >> Hmmm... This is executed with preemption off and the work is on the same >> cpu. If it would be able to run concurrently then we would need this. >> >> Ok but it could run from the timer interrupt if that is still on and >> occuring shortly before we go idle. Guess this needs to be similar to >> the code we execute on cpu down in the vmstat notifiers (see >> vmstat_cpuup_callback). >> >> Does this fix it? I have not been able to reproduce the issue so far. >> >> Patch against -next. >> >> >> >> Subject: vmstat: Use delayed work_sync and avoid loop. >> >> Signed-off-by: Christoph Lameter <cl@linux.com> >> >> Index: linux/mm/vmstat.c >> =================================================================== >> --- linux.orig/mm/vmstat.c >> +++ linux/mm/vmstat.c >> @@ -1419,11 +1419,9 @@ void quiet_vmstat(void) >> if (system_state != SYSTEM_RUNNING) >> return; >> >> - do { >> - if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) >> - cancel_delayed_work(this_cpu_ptr(&vmstat_work)); >> - >> - } while (refresh_cpu_vm_stats(false)); >> + refresh_cpu_vm_stats(false); >> + cancel_delayed_work_sync(this_cpu_ptr(&vmstat_work)); >> + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); >> } >> >> /* > > Can you please give it a try, seems it is reproducing easily at your end. > I'm seeing: [3637853.902081] BUG: sleeping function called from invalid context at kernel/workqueue.c:2725 [3637853.904291] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0 [3637853.905488] no locks held by swapper/0/0. [3637853.906387] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-next-20160114-sasha-00021-gf1273d1-dirty #2797 [3637853.908369] 1ffffffff5dc0f42 0f76d753b1b5b625 ffffffffaee07a90 ffffffffa2433eee [3637853.909561] 0000000041b58ab3 ffffffffae960c30 ffffffffa2433e26 ffffffffaee49bc0 [3637853.911601] 0000000000000000 0000000000000000 0000000000000000 ffffffffaee07a78 [3637853.913281] Call Trace: [3637853.913775] [<ffffffffa2433eee>] dump_stack+0xc8/0x12a [3637853.914649] [<ffffffffa2433e26>] ? _atomic_dec_and_lock+0x106/0x106 [3637853.915824] [<ffffffffa0440125>] ___might_sleep+0x2c5/0x480 [3637853.917056] [<ffffffffa044033b>] __might_sleep+0x5b/0x260 [3637853.918310] [<ffffffffa040a890>] flush_work+0xe0/0x820 [3637853.919270] [<ffffffffa040a7b0>] ? __queue_delayed_work+0x460/0x460 [3637853.921561] [<ffffffffa040a7b0>] ? __queue_delayed_work+0x460/0x460 [3637853.922898] [<ffffffffa054d367>] ? del_timer+0x107/0x170 [3637853.924395] [<ffffffffa054d260>] ? lock_timer_base+0x220/0x220 [3637853.925827] [<ffffffffa0411445>] ? try_to_grab_pending+0x115/0x680 [3637853.926924] [<ffffffffa0411b87>] ? __cancel_work_timer+0x1d7/0x5f0 [3637853.928215] [<ffffffffa04d4b58>] ? mark_held_locks+0x1a8/0x220 [3637853.929075] [<ffffffffa0411bdb>] __cancel_work_timer+0x22b/0x5f0 [3637853.930765] [<ffffffffa04119b0>] ? try_to_grab_pending+0x680/0x680 [3637853.931642] [<ffffffffa075fb60>] ? fill_contig_page_info+0x2e0/0x2e0 [3637853.932774] [<ffffffffa0411fda>] cancel_delayed_work_sync+0x1a/0x20 [3637853.933897] [<ffffffffa0766754>] quiet_vmstat+0x74/0x140 [3637853.934991] [<ffffffffa04c20eb>] cpu_startup_entry+0x12b/0x6d0 [3637853.936174] [<ffffffffa04c1fc0>] ? call_cpuidle+0x160/0x160 [3637853.937420] [<ffffffffabfe3816>] rest_init+0x1d6/0x1e0 [3637853.938799] [<ffffffffbb7e4c2a>] start_kernel+0x66c/0x6a6 [3637853.940742] [<ffffffffbb7e45be>] ? thread_info_cache_init+0xb/0xb [3637853.942416] [<ffffffffbb9b0ae7>] ? memblock_reserve+0x59/0x5e [3637853.943880] [<ffffffffbb7e3120>] ? early_idt_handler_array+0x120/0x120 [3637853.945161] [<ffffffffbb7e33c4>] x86_64_start_reservations+0x2a/0x2c [3637853.946005] [<ffffffffbb7e351d>] x86_64_start_kernel+0x157/0x17a ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-14 21:06 ` Sasha Levin 0 siblings, 0 replies; 114+ messages in thread From: Sasha Levin @ 2016-01-14 21:06 UTC (permalink / raw) To: Shiraz Hashim, Christoph Lameter; +Cc: Michal Hocko, LKML, linux-mm On 01/13/2016 06:36 AM, Shiraz Hashim wrote: > Hi Sasha, > > On Tue, Jan 12, 2016 at 5:53 PM, Christoph Lameter <cl@linux.com> wrote: >> On Tue, 12 Jan 2016, Shiraz Hashim wrote: >> >>>> + refresh_cpu_vm_stats(false); >>>> + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); >>>> >>> >>> shouldn't this be cancel_delayed_work_sync ? >> >> Hmmm... This is executed with preemption off and the work is on the same >> cpu. If it would be able to run concurrently then we would need this. >> >> Ok but it could run from the timer interrupt if that is still on and >> occuring shortly before we go idle. Guess this needs to be similar to >> the code we execute on cpu down in the vmstat notifiers (see >> vmstat_cpuup_callback). >> >> Does this fix it? I have not been able to reproduce the issue so far. >> >> Patch against -next. >> >> >> >> Subject: vmstat: Use delayed work_sync and avoid loop. >> >> Signed-off-by: Christoph Lameter <cl@linux.com> >> >> Index: linux/mm/vmstat.c >> =================================================================== >> --- linux.orig/mm/vmstat.c >> +++ linux/mm/vmstat.c >> @@ -1419,11 +1419,9 @@ void quiet_vmstat(void) >> if (system_state != SYSTEM_RUNNING) >> return; >> >> - do { >> - if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) >> - cancel_delayed_work(this_cpu_ptr(&vmstat_work)); >> - >> - } while (refresh_cpu_vm_stats(false)); >> + refresh_cpu_vm_stats(false); >> + cancel_delayed_work_sync(this_cpu_ptr(&vmstat_work)); >> + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); >> } >> >> /* > > Can you please give it a try, seems it is reproducing easily at your end. > I'm seeing: [3637853.902081] BUG: sleeping function called from invalid context at kernel/workqueue.c:2725 [3637853.904291] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0 [3637853.905488] no locks held by swapper/0/0. [3637853.906387] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-next-20160114-sasha-00021-gf1273d1-dirty #2797 [3637853.908369] 1ffffffff5dc0f42 0f76d753b1b5b625 ffffffffaee07a90 ffffffffa2433eee [3637853.909561] 0000000041b58ab3 ffffffffae960c30 ffffffffa2433e26 ffffffffaee49bc0 [3637853.911601] 0000000000000000 0000000000000000 0000000000000000 ffffffffaee07a78 [3637853.913281] Call Trace: [3637853.913775] [<ffffffffa2433eee>] dump_stack+0xc8/0x12a [3637853.914649] [<ffffffffa2433e26>] ? _atomic_dec_and_lock+0x106/0x106 [3637853.915824] [<ffffffffa0440125>] ___might_sleep+0x2c5/0x480 [3637853.917056] [<ffffffffa044033b>] __might_sleep+0x5b/0x260 [3637853.918310] [<ffffffffa040a890>] flush_work+0xe0/0x820 [3637853.919270] [<ffffffffa040a7b0>] ? __queue_delayed_work+0x460/0x460 [3637853.921561] [<ffffffffa040a7b0>] ? __queue_delayed_work+0x460/0x460 [3637853.922898] [<ffffffffa054d367>] ? del_timer+0x107/0x170 [3637853.924395] [<ffffffffa054d260>] ? lock_timer_base+0x220/0x220 [3637853.925827] [<ffffffffa0411445>] ? try_to_grab_pending+0x115/0x680 [3637853.926924] [<ffffffffa0411b87>] ? __cancel_work_timer+0x1d7/0x5f0 [3637853.928215] [<ffffffffa04d4b58>] ? mark_held_locks+0x1a8/0x220 [3637853.929075] [<ffffffffa0411bdb>] __cancel_work_timer+0x22b/0x5f0 [3637853.930765] [<ffffffffa04119b0>] ? try_to_grab_pending+0x680/0x680 [3637853.931642] [<ffffffffa075fb60>] ? fill_contig_page_info+0x2e0/0x2e0 [3637853.932774] [<ffffffffa0411fda>] cancel_delayed_work_sync+0x1a/0x20 [3637853.933897] [<ffffffffa0766754>] quiet_vmstat+0x74/0x140 [3637853.934991] [<ffffffffa04c20eb>] cpu_startup_entry+0x12b/0x6d0 [3637853.936174] [<ffffffffa04c1fc0>] ? call_cpuidle+0x160/0x160 [3637853.937420] [<ffffffffabfe3816>] rest_init+0x1d6/0x1e0 [3637853.938799] [<ffffffffbb7e4c2a>] start_kernel+0x66c/0x6a6 [3637853.940742] [<ffffffffbb7e45be>] ? thread_info_cache_init+0xb/0xb [3637853.942416] [<ffffffffbb9b0ae7>] ? memblock_reserve+0x59/0x5e [3637853.943880] [<ffffffffbb7e3120>] ? early_idt_handler_array+0x120/0x120 [3637853.945161] [<ffffffffbb7e33c4>] x86_64_start_reservations+0x2a/0x2c [3637853.946005] [<ffffffffbb7e351d>] x86_64_start_kernel+0x157/0x17a -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2015-12-19 0:33 ` Sasha Levin @ 2016-01-20 14:37 ` Michal Hocko -1 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-20 14:37 UTC (permalink / raw) To: Sasha Levin, Christoph Lameter; +Cc: LKML, linux-mm, Andrew Morton [CCing Andrew] I am just reading through this old discussion again because "vmstat: make vmstat_updater deferrable again and shut down on idle" which seems to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see any follow up fix merged to linus tree On Fri 18-12-15 19:33:07, Sasha Levin wrote: > Hi all, > > I've started seeing the following in the latest -next kernel. > > [ 531.127489] kernel BUG at mm/vmstat.c:1408! > [ 531.128157] invalid opcode: 0000 [#1] PREEMPT SMP KASAN > [ 531.128872] Modules linked in: > [ 531.129324] CPU: 6 PID: 407 Comm: kworker/6:1 Not tainted 4.4.0-rc5-next-20151218-sasha-00021-gaba8d84-dirty #2750 > [ 531.130939] Workqueue: vmstat vmstat_update > [ 531.131741] task: ffff880204070000 ti: ffff880204078000 task.ti: ffff880204078000 > [ 531.133189] RIP: vmstat_update (mm/vmstat.c:1408) > [ 531.134466] RSP: 0018:ffff88020407fbf8 EFLAGS: 00010293 > [ 531.135132] RAX: 0000000000000006 RBX: ffff8800418e2fd8 RCX: 0000000000000000 > [ 531.135995] RDX: 0000000000000007 RSI: ffffffff8c0982a0 RDI: ffffffff9b8bd6e4 > [ 531.137475] RBP: ffff88020407fc18 R08: 0000000000000000 R09: ffff880204070230 > [ 531.138304] R10: ffffffff8c0982a0 R11: 00000000e272b4f2 R12: ffff880204c1bf60 > [ 531.139329] R13: ffff880204ab09c8 R14: ffff880204ab09b8 R15: ffff880204ab09b0 > [ 531.140261] FS: 0000000000000000(0000) GS:ffff880204c00000(0000) knlGS:0000000000000000 > [ 531.141218] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 531.142036] CR2: 00007f039a8c1944 CR3: 000000000ea28000 CR4: 00000000000006a0 > [ 531.142752] Stack: > [ 531.142963] ffff880204c21000 ffff880204c21000 ffff880204c1bf60 ffff880204ab09c8 > [ 531.144095] ffff88020407fd40 ffffffff813a8fea 0000000041b58ab3 ffffffff8e667cdb > [ 531.145258] ffff880204ab09f8 ffff880204c1bf68 ffff880204ab09c0 ffff880200000000 > [ 531.146475] Call Trace: > [ 531.147037] process_one_work (./arch/x86/include/asm/preempt.h:22 kernel/workqueue.c:2045) > [ 531.150790] worker_thread (include/linux/compiler.h:218 include/linux/list.h:206 kernel/workqueue.c:2171) > [ 531.155176] kthread (kernel/kthread.c:209) > [ 531.158941] ret_from_fork (arch/x86/entry/entry_64.S:469) > [ 531.160654] Code: 75 1e be 79 00 00 00 48 c7 c7 80 0f 10 8c 89 45 e4 e8 cd 92 cd ff 8b 45 e4 c6 05 e1 c4 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 48 c7 c7 c0 f1 47 90 e8 3d 03 ae 01 48 83 c4 08 5b 41 5c > All code > ======== > 0: 75 1e jne 0x20 > 2: be 79 00 00 00 mov $0x79,%esi > 7: 48 c7 c7 80 0f 10 8c mov $0xffffffff8c100f80,%rdi > e: 89 45 e4 mov %eax,-0x1c(%rbp) > 11: e8 cd 92 cd ff callq 0xffffffffffcd92e3 > 16: 8b 45 e4 mov -0x1c(%rbp),%eax > 19: c6 05 e1 c4 13 1a 01 movb $0x1,0x1a13c4e1(%rip) # 0x1a13c501 > 20: 89 c0 mov %eax,%eax > 22: f0 48 0f ab 03 lock bts %rax,(%rbx) > 27: 72 02 jb 0x2b > 29: eb 0e jmp 0x39 > 2b:* 0f 0b ud2 <-- trapping instruction > 2d: 48 c7 c7 c0 f1 47 90 mov $0xffffffff9047f1c0,%rdi > 34: e8 3d 03 ae 01 callq 0x1ae0376 > 39: 48 83 c4 08 add $0x8,%rsp > 3d: 5b pop %rbx > 3e: 41 5c pop %r12 > ... > > Code starting with the faulting instruction > =========================================== > 0: 0f 0b ud2 > 2: 48 c7 c7 c0 f1 47 90 mov $0xffffffff9047f1c0,%rdi > 9: e8 3d 03 ae 01 callq 0x1ae034b > e: 48 83 c4 08 add $0x8,%rsp > 12: 5b pop %rbx > 13: 41 5c pop %r12 > ... > [ 531.164630] RIP vmstat_update (mm/vmstat.c:1408) > [ 531.165523] RSP <ffff88020407fbf8> > > > Thanks, > Sasha -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-20 14:37 ` Michal Hocko 0 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-20 14:37 UTC (permalink / raw) To: Sasha Levin, Christoph Lameter; +Cc: LKML, linux-mm, Andrew Morton [CCing Andrew] I am just reading through this old discussion again because "vmstat: make vmstat_updater deferrable again and shut down on idle" which seems to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see any follow up fix merged to linus tree On Fri 18-12-15 19:33:07, Sasha Levin wrote: > Hi all, > > I've started seeing the following in the latest -next kernel. > > [ 531.127489] kernel BUG at mm/vmstat.c:1408! > [ 531.128157] invalid opcode: 0000 [#1] PREEMPT SMP KASAN > [ 531.128872] Modules linked in: > [ 531.129324] CPU: 6 PID: 407 Comm: kworker/6:1 Not tainted 4.4.0-rc5-next-20151218-sasha-00021-gaba8d84-dirty #2750 > [ 531.130939] Workqueue: vmstat vmstat_update > [ 531.131741] task: ffff880204070000 ti: ffff880204078000 task.ti: ffff880204078000 > [ 531.133189] RIP: vmstat_update (mm/vmstat.c:1408) > [ 531.134466] RSP: 0018:ffff88020407fbf8 EFLAGS: 00010293 > [ 531.135132] RAX: 0000000000000006 RBX: ffff8800418e2fd8 RCX: 0000000000000000 > [ 531.135995] RDX: 0000000000000007 RSI: ffffffff8c0982a0 RDI: ffffffff9b8bd6e4 > [ 531.137475] RBP: ffff88020407fc18 R08: 0000000000000000 R09: ffff880204070230 > [ 531.138304] R10: ffffffff8c0982a0 R11: 00000000e272b4f2 R12: ffff880204c1bf60 > [ 531.139329] R13: ffff880204ab09c8 R14: ffff880204ab09b8 R15: ffff880204ab09b0 > [ 531.140261] FS: 0000000000000000(0000) GS:ffff880204c00000(0000) knlGS:0000000000000000 > [ 531.141218] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 531.142036] CR2: 00007f039a8c1944 CR3: 000000000ea28000 CR4: 00000000000006a0 > [ 531.142752] Stack: > [ 531.142963] ffff880204c21000 ffff880204c21000 ffff880204c1bf60 ffff880204ab09c8 > [ 531.144095] ffff88020407fd40 ffffffff813a8fea 0000000041b58ab3 ffffffff8e667cdb > [ 531.145258] ffff880204ab09f8 ffff880204c1bf68 ffff880204ab09c0 ffff880200000000 > [ 531.146475] Call Trace: > [ 531.147037] process_one_work (./arch/x86/include/asm/preempt.h:22 kernel/workqueue.c:2045) > [ 531.150790] worker_thread (include/linux/compiler.h:218 include/linux/list.h:206 kernel/workqueue.c:2171) > [ 531.155176] kthread (kernel/kthread.c:209) > [ 531.158941] ret_from_fork (arch/x86/entry/entry_64.S:469) > [ 531.160654] Code: 75 1e be 79 00 00 00 48 c7 c7 80 0f 10 8c 89 45 e4 e8 cd 92 cd ff 8b 45 e4 c6 05 e1 c4 13 1a 01 89 c0 f0 48 0f ab 03 72 02 eb 0e <0f> 0b 48 c7 c7 c0 f1 47 90 e8 3d 03 ae 01 48 83 c4 08 5b 41 5c > All code > ======== > 0: 75 1e jne 0x20 > 2: be 79 00 00 00 mov $0x79,%esi > 7: 48 c7 c7 80 0f 10 8c mov $0xffffffff8c100f80,%rdi > e: 89 45 e4 mov %eax,-0x1c(%rbp) > 11: e8 cd 92 cd ff callq 0xffffffffffcd92e3 > 16: 8b 45 e4 mov -0x1c(%rbp),%eax > 19: c6 05 e1 c4 13 1a 01 movb $0x1,0x1a13c4e1(%rip) # 0x1a13c501 > 20: 89 c0 mov %eax,%eax > 22: f0 48 0f ab 03 lock bts %rax,(%rbx) > 27: 72 02 jb 0x2b > 29: eb 0e jmp 0x39 > 2b:* 0f 0b ud2 <-- trapping instruction > 2d: 48 c7 c7 c0 f1 47 90 mov $0xffffffff9047f1c0,%rdi > 34: e8 3d 03 ae 01 callq 0x1ae0376 > 39: 48 83 c4 08 add $0x8,%rsp > 3d: 5b pop %rbx > 3e: 41 5c pop %r12 > ... > > Code starting with the faulting instruction > =========================================== > 0: 0f 0b ud2 > 2: 48 c7 c7 c0 f1 47 90 mov $0xffffffff9047f1c0,%rdi > 9: e8 3d 03 ae 01 callq 0x1ae034b > e: 48 83 c4 08 add $0x8,%rsp > 12: 5b pop %rbx > 13: 41 5c pop %r12 > ... > [ 531.164630] RIP vmstat_update (mm/vmstat.c:1408) > [ 531.165523] RSP <ffff88020407fbf8> > > > Thanks, > Sasha -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-20 14:37 ` Michal Hocko @ 2016-01-20 14:56 ` Sasha Levin -1 siblings, 0 replies; 114+ messages in thread From: Sasha Levin @ 2016-01-20 14:56 UTC (permalink / raw) To: Michal Hocko, Christoph Lameter; +Cc: LKML, linux-mm, Andrew Morton On 01/20/2016 09:37 AM, Michal Hocko wrote: > I am just reading through this old discussion again because "vmstat: > make vmstat_updater deferrable again and shut down on idle" which seems > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see > any follow up fix merged to linus tree So this isn't an "old" discussion - the bug is very much there and I can hit it easily. As a workaround I've "disabled" vmstat. Thanks, Sasha ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-20 14:56 ` Sasha Levin 0 siblings, 0 replies; 114+ messages in thread From: Sasha Levin @ 2016-01-20 14:56 UTC (permalink / raw) To: Michal Hocko, Christoph Lameter; +Cc: LKML, linux-mm, Andrew Morton On 01/20/2016 09:37 AM, Michal Hocko wrote: > I am just reading through this old discussion again because "vmstat: > make vmstat_updater deferrable again and shut down on idle" which seems > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see > any follow up fix merged to linus tree So this isn't an "old" discussion - the bug is very much there and I can hit it easily. As a workaround I've "disabled" vmstat. Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-20 14:56 ` Sasha Levin @ 2016-01-20 15:10 ` Michal Hocko -1 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-20 15:10 UTC (permalink / raw) To: Sasha Levin; +Cc: Christoph Lameter, LKML, linux-mm, Andrew Morton On Wed 20-01-16 09:56:26, Sasha Levin wrote: > On 01/20/2016 09:37 AM, Michal Hocko wrote: > > I am just reading through this old discussion again because "vmstat: > > make vmstat_updater deferrable again and shut down on idle" which seems > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see > > any follow up fix merged to linus tree > > So this isn't an "old" discussion - the bug is very much there and I can > hit it easily. As a workaround I've "disabled" vmstat. Well the report is since 18th Dec which is over month old. Should we revert 0eb77e988032 as a pre caution and make sure this is done properly in -mm tree. AFAIR none of the proposed fix worked without other fallouts? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-20 15:10 ` Michal Hocko 0 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-20 15:10 UTC (permalink / raw) To: Sasha Levin; +Cc: Christoph Lameter, LKML, linux-mm, Andrew Morton On Wed 20-01-16 09:56:26, Sasha Levin wrote: > On 01/20/2016 09:37 AM, Michal Hocko wrote: > > I am just reading through this old discussion again because "vmstat: > > make vmstat_updater deferrable again and shut down on idle" which seems > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see > > any follow up fix merged to linus tree > > So this isn't an "old" discussion - the bug is very much there and I can > hit it easily. As a workaround I've "disabled" vmstat. Well the report is since 18th Dec which is over month old. Should we revert 0eb77e988032 as a pre caution and make sure this is done properly in -mm tree. AFAIR none of the proposed fix worked without other fallouts? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-20 15:10 ` Michal Hocko @ 2016-01-20 15:20 ` Christoph Lameter -1 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-20 15:20 UTC (permalink / raw) To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Wed, 20 Jan 2016, Michal Hocko wrote: > On Wed 20-01-16 09:56:26, Sasha Levin wrote: > > On 01/20/2016 09:37 AM, Michal Hocko wrote: > > > I am just reading through this old discussion again because "vmstat: > > > make vmstat_updater deferrable again and shut down on idle" which seems > > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see > > > any follow up fix merged to linus tree > > > > So this isn't an "old" discussion - the bug is very much there and I can > > hit it easily. As a workaround I've "disabled" vmstat. > > Well the report is since 18th Dec which is over month old. Should we > revert 0eb77e988032 as a pre caution and make sure this is done properly > in -mm tree. AFAIR none of the proposed fix worked without other > fallouts? Seems that we are unable to get enough information to reproduce the issue? ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-20 15:20 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-20 15:20 UTC (permalink / raw) To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Wed, 20 Jan 2016, Michal Hocko wrote: > On Wed 20-01-16 09:56:26, Sasha Levin wrote: > > On 01/20/2016 09:37 AM, Michal Hocko wrote: > > > I am just reading through this old discussion again because "vmstat: > > > make vmstat_updater deferrable again and shut down on idle" which seems > > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see > > > any follow up fix merged to linus tree > > > > So this isn't an "old" discussion - the bug is very much there and I can > > hit it easily. As a workaround I've "disabled" vmstat. > > Well the report is since 18th Dec which is over month old. Should we > revert 0eb77e988032 as a pre caution and make sure this is done properly > in -mm tree. AFAIR none of the proposed fix worked without other > fallouts? Seems that we are unable to get enough information to reproduce the issue? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-20 15:20 ` Christoph Lameter @ 2016-01-20 15:49 ` Sasha Levin -1 siblings, 0 replies; 114+ messages in thread From: Sasha Levin @ 2016-01-20 15:49 UTC (permalink / raw) To: Christoph Lameter, Michal Hocko; +Cc: LKML, linux-mm, Andrew Morton On 01/20/2016 10:20 AM, Christoph Lameter wrote: > On Wed, 20 Jan 2016, Michal Hocko wrote: > >> > On Wed 20-01-16 09:56:26, Sasha Levin wrote: >>> > > On 01/20/2016 09:37 AM, Michal Hocko wrote: >>>> > > > I am just reading through this old discussion again because "vmstat: >>>> > > > make vmstat_updater deferrable again and shut down on idle" which seems >>>> > > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see >>>> > > > any follow up fix merged to linus tree >>> > > >>> > > So this isn't an "old" discussion - the bug is very much there and I can >>> > > hit it easily. As a workaround I've "disabled" vmstat. >> > >> > Well the report is since 18th Dec which is over month old. Should we >> > revert 0eb77e988032 as a pre caution and make sure this is done properly >> > in -mm tree. AFAIR none of the proposed fix worked without other >> > fallouts? > Seems that we are unable to get enough information to reproduce the issue? As I've mentioned - this reproduces frequently. I'd be happy to add in debug information into the kernel that might help you reproduce it, but as it seems like a timing issue, I can't provide a simple reproducer. Thanks, Sasha ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-20 15:49 ` Sasha Levin 0 siblings, 0 replies; 114+ messages in thread From: Sasha Levin @ 2016-01-20 15:49 UTC (permalink / raw) To: Christoph Lameter, Michal Hocko; +Cc: LKML, linux-mm, Andrew Morton On 01/20/2016 10:20 AM, Christoph Lameter wrote: > On Wed, 20 Jan 2016, Michal Hocko wrote: > >> > On Wed 20-01-16 09:56:26, Sasha Levin wrote: >>> > > On 01/20/2016 09:37 AM, Michal Hocko wrote: >>>> > > > I am just reading through this old discussion again because "vmstat: >>>> > > > make vmstat_updater deferrable again and shut down on idle" which seems >>>> > > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see >>>> > > > any follow up fix merged to linus tree >>> > > >>> > > So this isn't an "old" discussion - the bug is very much there and I can >>> > > hit it easily. As a workaround I've "disabled" vmstat. >> > >> > Well the report is since 18th Dec which is over month old. Should we >> > revert 0eb77e988032 as a pre caution and make sure this is done properly >> > in -mm tree. AFAIR none of the proposed fix worked without other >> > fallouts? > Seems that we are unable to get enough information to reproduce the issue? As I've mentioned - this reproduces frequently. I'd be happy to add in debug information into the kernel that might help you reproduce it, but as it seems like a timing issue, I can't provide a simple reproducer. Thanks, Sasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-20 15:49 ` Sasha Levin @ 2016-01-20 15:55 ` Christoph Lameter -1 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-20 15:55 UTC (permalink / raw) To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm, Andrew Morton On Wed, 20 Jan 2016, Sasha Levin wrote: > > As I've mentioned - this reproduces frequently. I'd be happy to add in debug > information into the kernel that might help you reproduce it, but as it seems > like a timing issue, I can't provide a simple reproducer. This isnt really important I think. Lets remove it. Subject: vmstat: Remove BUG_ON from vmstat_update If we detect that there is nothing to do just set the flag and do not check if it was already set before. Races really do not matter. If the flag is set by any code then the shepherd will start dealing with the situation and reenable the vmstat workers when necessary again. Concurrent actions could be onlining and offlining of processors or be a result of concurrency issues when updating the cpumask from multiple processors. Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/mm/vmstat.c =================================================================== --- linux.orig/mm/vmstat.c +++ linux/mm/vmstat.c @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st * Defer the checking for differentials to the * shepherd thread on a different processor. */ - int r; - /* - * Shepherd work thread does not race since it never - * changes the bit if its zero but the cpu - * online / off line code may race if - * worker threads are still allowed during - * shutdown / startup. - */ - r = cpumask_test_and_set_cpu(smp_processor_id(), - cpu_stat_off); - VM_BUG_ON(r); + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); } } ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-20 15:55 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-20 15:55 UTC (permalink / raw) To: Sasha Levin; +Cc: Michal Hocko, LKML, linux-mm, Andrew Morton On Wed, 20 Jan 2016, Sasha Levin wrote: > > As I've mentioned - this reproduces frequently. I'd be happy to add in debug > information into the kernel that might help you reproduce it, but as it seems > like a timing issue, I can't provide a simple reproducer. This isnt really important I think. Lets remove it. Subject: vmstat: Remove BUG_ON from vmstat_update If we detect that there is nothing to do just set the flag and do not check if it was already set before. Races really do not matter. If the flag is set by any code then the shepherd will start dealing with the situation and reenable the vmstat workers when necessary again. Concurrent actions could be onlining and offlining of processors or be a result of concurrency issues when updating the cpumask from multiple processors. Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/mm/vmstat.c =================================================================== --- linux.orig/mm/vmstat.c +++ linux/mm/vmstat.c @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st * Defer the checking for differentials to the * shepherd thread on a different processor. */ - int r; - /* - * Shepherd work thread does not race since it never - * changes the bit if its zero but the cpu - * online / off line code may race if - * worker threads are still allowed during - * shutdown / startup. - */ - r = cpumask_test_and_set_cpu(smp_processor_id(), - cpu_stat_off); - VM_BUG_ON(r); + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); } } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-20 15:55 ` Christoph Lameter @ 2016-01-20 21:28 ` Michal Hocko -1 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-20 21:28 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Wed 20-01-16 09:55:22, Christoph Lameter wrote: [...] > Subject: vmstat: Remove BUG_ON from vmstat_update > > If we detect that there is nothing to do just set the flag and do not check > if it was already set before. Races really do not matter. If the flag is > set by any code then the shepherd will start dealing with the situation > and reenable the vmstat workers when necessary again. > > Concurrent actions could be onlining and offlining of processors or be a > result of concurrency issues when updating the cpumask from multiple > processors. Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle) is merged the VM_BUG_ON is simply bogus because vmstat_update might "race" with quiet_vmstat. The changelog should reflect that. What about the following wording? " Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle") quiet_vmstat might update cpu_stat_off and mark a particular cpu to be handled by vmstat_shepherd. This might trigger a VM_BUG_ON in vmstat_update because the work item might have been sleeping during the idle period and see the cpu_stat_off updated after the wake up. The VM_BUG_ON is therefore misleading and no more appropriate. Moreover it doesn't really suite any protection from real bugs because vmstat_shepherd will simply reschedule the vmstat_work anytime it sees a particular cpu set or vmstat_update would do the same from the worker context directly. Even when the two would race the result wouldn't be incorrect as the counters update is fully idempotent. Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle") CC: stable # 4.4+ " > Signed-off-by: Christoph Lameter <cl@linux.com> > > Index: linux/mm/vmstat.c > =================================================================== > --- linux.orig/mm/vmstat.c > +++ linux/mm/vmstat.c > @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st > * Defer the checking for differentials to the > * shepherd thread on a different processor. > */ > - int r; > - /* > - * Shepherd work thread does not race since it never > - * changes the bit if its zero but the cpu > - * online / off line code may race if > - * worker threads are still allowed during > - * shutdown / startup. > - */ > - r = cpumask_test_and_set_cpu(smp_processor_id(), > - cpu_stat_off); > - VM_BUG_ON(r); > + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); > } > } -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-20 21:28 ` Michal Hocko 0 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-20 21:28 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Wed 20-01-16 09:55:22, Christoph Lameter wrote: [...] > Subject: vmstat: Remove BUG_ON from vmstat_update > > If we detect that there is nothing to do just set the flag and do not check > if it was already set before. Races really do not matter. If the flag is > set by any code then the shepherd will start dealing with the situation > and reenable the vmstat workers when necessary again. > > Concurrent actions could be onlining and offlining of processors or be a > result of concurrency issues when updating the cpumask from multiple > processors. Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle) is merged the VM_BUG_ON is simply bogus because vmstat_update might "race" with quiet_vmstat. The changelog should reflect that. What about the following wording? " Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle") quiet_vmstat might update cpu_stat_off and mark a particular cpu to be handled by vmstat_shepherd. This might trigger a VM_BUG_ON in vmstat_update because the work item might have been sleeping during the idle period and see the cpu_stat_off updated after the wake up. The VM_BUG_ON is therefore misleading and no more appropriate. Moreover it doesn't really suite any protection from real bugs because vmstat_shepherd will simply reschedule the vmstat_work anytime it sees a particular cpu set or vmstat_update would do the same from the worker context directly. Even when the two would race the result wouldn't be incorrect as the counters update is fully idempotent. Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle") CC: stable # 4.4+ " > Signed-off-by: Christoph Lameter <cl@linux.com> > > Index: linux/mm/vmstat.c > =================================================================== > --- linux.orig/mm/vmstat.c > +++ linux/mm/vmstat.c > @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st > * Defer the checking for differentials to the > * shepherd thread on a different processor. > */ > - int r; > - /* > - * Shepherd work thread does not race since it never > - * changes the bit if its zero but the cpu > - * online / off line code may race if > - * worker threads are still allowed during > - * shutdown / startup. > - */ > - r = cpumask_test_and_set_cpu(smp_processor_id(), > - cpu_stat_off); > - VM_BUG_ON(r); > + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); > } > } -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-20 21:28 ` Michal Hocko @ 2016-01-20 21:57 ` Christoph Lameter -1 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-20 21:57 UTC (permalink / raw) To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Wed, 20 Jan 2016, Michal Hocko wrote: > On Wed 20-01-16 09:55:22, Christoph Lameter wrote: > [...] > > Subject: vmstat: Remove BUG_ON from vmstat_update > > > > If we detect that there is nothing to do just set the flag and do not check > > if it was already set before. Races really do not matter. If the flag is > > set by any code then the shepherd will start dealing with the situation > > and reenable the vmstat workers when necessary again. > > > > Concurrent actions could be onlining and offlining of processors or be a > > result of concurrency issues when updating the cpumask from multiple > > processors. > > Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and > shut down on idle) is merged the VM_BUG_ON is simply bogus because > vmstat_update might "race" with quiet_vmstat. The changelog should > reflect that. What about the following wording? How can it race if preemption is off? > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a > particular cpu to be handled by vmstat_shepherd. This might trigger > a VM_BUG_ON in vmstat_update because the work item might have been > sleeping during the idle period and see the cpu_stat_off updated after > the wake up. The VM_BUG_ON is therefore misleading and no more > appropriate. Moreover it doesn't really suite any protection from real > bugs because vmstat_shepherd will simply reschedule the vmstat_work > anytime it sees a particular cpu set or vmstat_update would do the same > from the worker context directly. Even when the two would race the > result wouldn't be incorrect as the counters update is fully idempotent. Hmmm... the vmstat_update can be interrupted while running and the cpu put into idle mode? If vmstat_update is running then the cpu is not idle but running code. If this is really going on then there is other stuff wrong with the idling logic. > Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and > shut down on idle") > CC: stable # 4.4+ ?? There has not been an upstream release with this yet. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-20 21:57 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-20 21:57 UTC (permalink / raw) To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Wed, 20 Jan 2016, Michal Hocko wrote: > On Wed 20-01-16 09:55:22, Christoph Lameter wrote: > [...] > > Subject: vmstat: Remove BUG_ON from vmstat_update > > > > If we detect that there is nothing to do just set the flag and do not check > > if it was already set before. Races really do not matter. If the flag is > > set by any code then the shepherd will start dealing with the situation > > and reenable the vmstat workers when necessary again. > > > > Concurrent actions could be onlining and offlining of processors or be a > > result of concurrency issues when updating the cpumask from multiple > > processors. > > Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and > shut down on idle) is merged the VM_BUG_ON is simply bogus because > vmstat_update might "race" with quiet_vmstat. The changelog should > reflect that. What about the following wording? How can it race if preemption is off? > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a > particular cpu to be handled by vmstat_shepherd. This might trigger > a VM_BUG_ON in vmstat_update because the work item might have been > sleeping during the idle period and see the cpu_stat_off updated after > the wake up. The VM_BUG_ON is therefore misleading and no more > appropriate. Moreover it doesn't really suite any protection from real > bugs because vmstat_shepherd will simply reschedule the vmstat_work > anytime it sees a particular cpu set or vmstat_update would do the same > from the worker context directly. Even when the two would race the > result wouldn't be incorrect as the counters update is fully idempotent. Hmmm... the vmstat_update can be interrupted while running and the cpu put into idle mode? If vmstat_update is running then the cpu is not idle but running code. If this is really going on then there is other stuff wrong with the idling logic. > Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and > shut down on idle") > CC: stable # 4.4+ ?? There has not been an upstream release with this yet. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-20 21:57 ` Christoph Lameter @ 2016-01-21 8:24 ` Michal Hocko -1 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-21 8:24 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Wed 20-01-16 15:57:43, Christoph Lameter wrote: > On Wed, 20 Jan 2016, Michal Hocko wrote: > > > On Wed 20-01-16 09:55:22, Christoph Lameter wrote: > > [...] > > > Subject: vmstat: Remove BUG_ON from vmstat_update > > > > > > If we detect that there is nothing to do just set the flag and do not check > > > if it was already set before. Races really do not matter. If the flag is > > > set by any code then the shepherd will start dealing with the situation > > > and reenable the vmstat workers when necessary again. > > > > > > Concurrent actions could be onlining and offlining of processors or be a > > > result of concurrency issues when updating the cpumask from multiple > > > processors. > > > > Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and > > shut down on idle) is merged the VM_BUG_ON is simply bogus because > > vmstat_update might "race" with quiet_vmstat. The changelog should > > reflect that. What about the following wording? > > How can it race if preemption is off? See below. > > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and > > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a > > particular cpu to be handled by vmstat_shepherd. This might trigger > > a VM_BUG_ON in vmstat_update because the work item might have been > > sleeping during the idle period and see the cpu_stat_off updated after > > the wake up. The VM_BUG_ON is therefore misleading and no more > > appropriate. Moreover it doesn't really suite any protection from real > > bugs because vmstat_shepherd will simply reschedule the vmstat_work > > anytime it sees a particular cpu set or vmstat_update would do the same > > from the worker context directly. Even when the two would race the > > result wouldn't be incorrect as the counters update is fully idempotent. > > > Hmmm... the vmstat_update can be interrupted while running and the cpu put > into idle mode? If vmstat_update is running then the cpu is not idle but > running code. If this is really going on then there is other stuff wrong > with the idling logic. The vmstat update might be still waiting for its timer, idle mode started and kick vmstat_update which might cpumask_test_and_set_cpu. Once the idle terminates and the originally schedule vmstate_update executes it sees the bit set and BUG_ON. > > Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and > > shut down on idle") > > CC: stable # 4.4+ > > ?? There has not been an upstream release with this yet. Ohh, I thought it made it into 4.4 but you are right it is post 4.4 so no CC: stable required. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-21 8:24 ` Michal Hocko 0 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-21 8:24 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Wed 20-01-16 15:57:43, Christoph Lameter wrote: > On Wed, 20 Jan 2016, Michal Hocko wrote: > > > On Wed 20-01-16 09:55:22, Christoph Lameter wrote: > > [...] > > > Subject: vmstat: Remove BUG_ON from vmstat_update > > > > > > If we detect that there is nothing to do just set the flag and do not check > > > if it was already set before. Races really do not matter. If the flag is > > > set by any code then the shepherd will start dealing with the situation > > > and reenable the vmstat workers when necessary again. > > > > > > Concurrent actions could be onlining and offlining of processors or be a > > > result of concurrency issues when updating the cpumask from multiple > > > processors. > > > > Now that 7e988032 ("vmstat: make vmstat_updater deferrable again and > > shut down on idle) is merged the VM_BUG_ON is simply bogus because > > vmstat_update might "race" with quiet_vmstat. The changelog should > > reflect that. What about the following wording? > > How can it race if preemption is off? See below. > > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and > > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a > > particular cpu to be handled by vmstat_shepherd. This might trigger > > a VM_BUG_ON in vmstat_update because the work item might have been > > sleeping during the idle period and see the cpu_stat_off updated after > > the wake up. The VM_BUG_ON is therefore misleading and no more > > appropriate. Moreover it doesn't really suite any protection from real > > bugs because vmstat_shepherd will simply reschedule the vmstat_work > > anytime it sees a particular cpu set or vmstat_update would do the same > > from the worker context directly. Even when the two would race the > > result wouldn't be incorrect as the counters update is fully idempotent. > > > Hmmm... the vmstat_update can be interrupted while running and the cpu put > into idle mode? If vmstat_update is running then the cpu is not idle but > running code. If this is really going on then there is other stuff wrong > with the idling logic. The vmstat update might be still waiting for its timer, idle mode started and kick vmstat_update which might cpumask_test_and_set_cpu. Once the idle terminates and the originally schedule vmstate_update executes it sees the bit set and BUG_ON. > > Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and > > shut down on idle") > > CC: stable # 4.4+ > > ?? There has not been an upstream release with this yet. Ohh, I thought it made it into 4.4 but you are right it is post 4.4 so no CC: stable required. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-21 8:24 ` Michal Hocko @ 2016-01-21 15:45 ` Christoph Lameter -1 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-21 15:45 UTC (permalink / raw) To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Thu, 21 Jan 2016, Michal Hocko wrote: > > > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and > > > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a > > > particular cpu to be handled by vmstat_shepherd. This might trigger > > > a VM_BUG_ON in vmstat_update because the work item might have been > > > sleeping during the idle period and see the cpu_stat_off updated after > > > the wake up. The VM_BUG_ON is therefore misleading and no more > > > appropriate. Moreover it doesn't really suite any protection from real > > > bugs because vmstat_shepherd will simply reschedule the vmstat_work > > > anytime it sees a particular cpu set or vmstat_update would do the same > > > from the worker context directly. Even when the two would race the > > > result wouldn't be incorrect as the counters update is fully idempotent. > > > > > > Hmmm... the vmstat_update can be interrupted while running and the cpu put > > into idle mode? If vmstat_update is running then the cpu is not idle but > > running code. If this is really going on then there is other stuff wrong > > with the idling logic. > > The vmstat update might be still waiting for its timer, idle mode started > and kick vmstat_update which might cpumask_test_and_set_cpu. Once the > idle terminates and the originally schedule vmstate_update executes it > sees the bit set and BUG_ON. Ok so we are going into idle mode and the vmstat_update timer is pending. Then the timer will not fire since going idle switches preemption off. quiet_vmstat will run without the chance of running vmstat_update We could be going idle and not have disabled preemption yet. Then vmstat_update will run. On return to the idling operation preemption will be disabled and quiet_vmstat() will be run. I do not see how these two things could race. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-21 15:45 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-21 15:45 UTC (permalink / raw) To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Thu, 21 Jan 2016, Michal Hocko wrote: > > > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and > > > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a > > > particular cpu to be handled by vmstat_shepherd. This might trigger > > > a VM_BUG_ON in vmstat_update because the work item might have been > > > sleeping during the idle period and see the cpu_stat_off updated after > > > the wake up. The VM_BUG_ON is therefore misleading and no more > > > appropriate. Moreover it doesn't really suite any protection from real > > > bugs because vmstat_shepherd will simply reschedule the vmstat_work > > > anytime it sees a particular cpu set or vmstat_update would do the same > > > from the worker context directly. Even when the two would race the > > > result wouldn't be incorrect as the counters update is fully idempotent. > > > > > > Hmmm... the vmstat_update can be interrupted while running and the cpu put > > into idle mode? If vmstat_update is running then the cpu is not idle but > > running code. If this is really going on then there is other stuff wrong > > with the idling logic. > > The vmstat update might be still waiting for its timer, idle mode started > and kick vmstat_update which might cpumask_test_and_set_cpu. Once the > idle terminates and the originally schedule vmstate_update executes it > sees the bit set and BUG_ON. Ok so we are going into idle mode and the vmstat_update timer is pending. Then the timer will not fire since going idle switches preemption off. quiet_vmstat will run without the chance of running vmstat_update We could be going idle and not have disabled preemption yet. Then vmstat_update will run. On return to the idling operation preemption will be disabled and quiet_vmstat() will be run. I do not see how these two things could race. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-21 15:45 ` Christoph Lameter @ 2016-01-21 16:51 ` Michal Hocko -1 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-21 16:51 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Thu 21-01-16 09:45:12, Christoph Lameter wrote: > On Thu, 21 Jan 2016, Michal Hocko wrote: [...] > > The vmstat update might be still waiting for its timer, idle mode started > > and kick vmstat_update which might cpumask_test_and_set_cpu. Once the > > idle terminates and the originally schedule vmstate_update executes it > > sees the bit set and BUG_ON. > > Ok so we are going into idle mode and the vmstat_update timer is pending. > Then the timer will not fire since going idle switches preemption off. > quiet_vmstat will run without the chance of running vmstat_update > > We could be going idle and not have disabled preemption yet. Then > vmstat_update will run. On return to the idling operation preemption will > be disabled and quiet_vmstat() will be run. > > I do not see how these two things could race. It goes like this: CPU0: CPU1 vmstat_update cpumask_test_and_set_cpu (0->1) [...] vmstat_shepherd <enter idle> cpumask_test_and_clear_cpu(CPU0) (1->0) quiet_vmstat cpumask_test_and_set_cpu (0->1) queue_delayed_work_on(CPU0) refresh_cpu_vm_stats() [...] vmstat_update nothing_to_do cpumask_test_and_set_cpu (1->1) VM_BUG_ON Or am I missing something? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-21 16:51 ` Michal Hocko 0 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-21 16:51 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Thu 21-01-16 09:45:12, Christoph Lameter wrote: > On Thu, 21 Jan 2016, Michal Hocko wrote: [...] > > The vmstat update might be still waiting for its timer, idle mode started > > and kick vmstat_update which might cpumask_test_and_set_cpu. Once the > > idle terminates and the originally schedule vmstate_update executes it > > sees the bit set and BUG_ON. > > Ok so we are going into idle mode and the vmstat_update timer is pending. > Then the timer will not fire since going idle switches preemption off. > quiet_vmstat will run without the chance of running vmstat_update > > We could be going idle and not have disabled preemption yet. Then > vmstat_update will run. On return to the idling operation preemption will > be disabled and quiet_vmstat() will be run. > > I do not see how these two things could race. It goes like this: CPU0: CPU1 vmstat_update cpumask_test_and_set_cpu (0->1) [...] vmstat_shepherd <enter idle> cpumask_test_and_clear_cpu(CPU0) (1->0) quiet_vmstat cpumask_test_and_set_cpu (0->1) queue_delayed_work_on(CPU0) refresh_cpu_vm_stats() [...] vmstat_update nothing_to_do cpumask_test_and_set_cpu (1->1) VM_BUG_ON Or am I missing something? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-21 16:51 ` Michal Hocko @ 2016-01-21 17:38 ` Christoph Lameter -1 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-21 17:38 UTC (permalink / raw) To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Thu, 21 Jan 2016, Michal Hocko wrote: > It goes like this: > CPU0: CPU1 > vmstat_update > cpumask_test_and_set_cpu (0->1) > [...] > vmstat_shepherd > <enter idle> cpumask_test_and_clear_cpu(CPU0) (1->0) > quiet_vmstat > cpumask_test_and_set_cpu (0->1) > queue_delayed_work_on(CPU0) > refresh_cpu_vm_stats() > [...] > vmstat_update > nothing_to_do > cpumask_test_and_set_cpu (1->1) > VM_BUG_ON > > Or am I missing something? Ok then the following should fix it: Subject: vmstat: Queue work before clearing cpu_stat_off There is a race between vmstat_shepherd and quiet_vmstat() because the responsibility for checking for counter updates changes depending on the state of teh bit in cpu_stat_off. So queue the work before changing state of the bit in vmstat_shepherd. That way quiet_vmstat is guaranteed to remove the work request when clearing the bit and the bug in vmstat_update wont trigger anymore. Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/mm/vmstat.c =================================================================== --- linux.orig/mm/vmstat.c +++ linux/mm/vmstat.c @@ -1480,12 +1480,14 @@ static void vmstat_shepherd(struct work_ get_online_cpus(); /* Check processors whose vmstat worker threads have been disabled */ for_each_cpu(cpu, cpu_stat_off) - if (need_update(cpu) && - cpumask_test_and_clear_cpu(cpu, cpu_stat_off)) + if (need_update(cpu)) { queue_delayed_work_on(cpu, vmstat_wq, &per_cpu(vmstat_work, cpu), 0); + cpumask_clear_cpu(smp_processor_id(), cpu_stat_off); + } + put_online_cpus(); schedule_delayed_work(&shepherd, ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-21 17:38 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-21 17:38 UTC (permalink / raw) To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Thu, 21 Jan 2016, Michal Hocko wrote: > It goes like this: > CPU0: CPU1 > vmstat_update > cpumask_test_and_set_cpu (0->1) > [...] > vmstat_shepherd > <enter idle> cpumask_test_and_clear_cpu(CPU0) (1->0) > quiet_vmstat > cpumask_test_and_set_cpu (0->1) > queue_delayed_work_on(CPU0) > refresh_cpu_vm_stats() > [...] > vmstat_update > nothing_to_do > cpumask_test_and_set_cpu (1->1) > VM_BUG_ON > > Or am I missing something? Ok then the following should fix it: Subject: vmstat: Queue work before clearing cpu_stat_off There is a race between vmstat_shepherd and quiet_vmstat() because the responsibility for checking for counter updates changes depending on the state of teh bit in cpu_stat_off. So queue the work before changing state of the bit in vmstat_shepherd. That way quiet_vmstat is guaranteed to remove the work request when clearing the bit and the bug in vmstat_update wont trigger anymore. Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/mm/vmstat.c =================================================================== --- linux.orig/mm/vmstat.c +++ linux/mm/vmstat.c @@ -1480,12 +1480,14 @@ static void vmstat_shepherd(struct work_ get_online_cpus(); /* Check processors whose vmstat worker threads have been disabled */ for_each_cpu(cpu, cpu_stat_off) - if (need_update(cpu) && - cpumask_test_and_clear_cpu(cpu, cpu_stat_off)) + if (need_update(cpu)) { queue_delayed_work_on(cpu, vmstat_wq, &per_cpu(vmstat_work, cpu), 0); + cpumask_clear_cpu(smp_processor_id(), cpu_stat_off); + } + put_online_cpus(); schedule_delayed_work(&shepherd, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-21 17:38 ` Christoph Lameter @ 2016-01-22 11:00 ` Shiraz Hashim -1 siblings, 0 replies; 114+ messages in thread From: Shiraz Hashim @ 2016-01-22 11:00 UTC (permalink / raw) To: Christoph Lameter Cc: Michal Hocko, Sasha Levin, LKML, linux-mm, Andrew Morton On Thu, Jan 21, 2016 at 11:08 PM, Christoph Lameter <cl@linux.com> wrote: > Subject: vmstat: Queue work before clearing cpu_stat_off > > There is a race between vmstat_shepherd and quiet_vmstat() because > the responsibility for checking for counter updates changes depending > on the state of teh bit in cpu_stat_off. So queue the work before > changing state of the bit in vmstat_shepherd. That way quiet_vmstat > is guaranteed to remove the work request when clearing the bit and the > bug in vmstat_update wont trigger anymore. > > Signed-off-by: Christoph Lameter <cl@linux.com> > > Index: linux/mm/vmstat.c > =================================================================== > --- linux.orig/mm/vmstat.c > +++ linux/mm/vmstat.c > @@ -1480,12 +1480,14 @@ static void vmstat_shepherd(struct work_ > get_online_cpus(); > /* Check processors whose vmstat worker threads have been disabled */ > for_each_cpu(cpu, cpu_stat_off) > - if (need_update(cpu) && > - cpumask_test_and_clear_cpu(cpu, cpu_stat_off)) > + if (need_update(cpu)) { > > queue_delayed_work_on(cpu, vmstat_wq, > &per_cpu(vmstat_work, cpu), 0); > > + cpumask_clear_cpu(smp_processor_id(), cpu_stat_off); > + } > + > put_online_cpus(); > > schedule_delayed_work(&shepherd, This can alternatively lead to following where vmstat may not be scheduled for cpu when it is back from idle. CPU0: CPU1: vmstat_shepherd <enter idle> queue_delayed_work_on(CPU0) quiet_vmstat cancel_delayed_work cpumask_test_and_set_cpu (0->1) cpumask_clear_cpu(CPU0) (1->0) -- regards Shiraz Hashim ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-22 11:00 ` Shiraz Hashim 0 siblings, 0 replies; 114+ messages in thread From: Shiraz Hashim @ 2016-01-22 11:00 UTC (permalink / raw) To: Christoph Lameter Cc: Michal Hocko, Sasha Levin, LKML, linux-mm, Andrew Morton On Thu, Jan 21, 2016 at 11:08 PM, Christoph Lameter <cl@linux.com> wrote: > Subject: vmstat: Queue work before clearing cpu_stat_off > > There is a race between vmstat_shepherd and quiet_vmstat() because > the responsibility for checking for counter updates changes depending > on the state of teh bit in cpu_stat_off. So queue the work before > changing state of the bit in vmstat_shepherd. That way quiet_vmstat > is guaranteed to remove the work request when clearing the bit and the > bug in vmstat_update wont trigger anymore. > > Signed-off-by: Christoph Lameter <cl@linux.com> > > Index: linux/mm/vmstat.c > =================================================================== > --- linux.orig/mm/vmstat.c > +++ linux/mm/vmstat.c > @@ -1480,12 +1480,14 @@ static void vmstat_shepherd(struct work_ > get_online_cpus(); > /* Check processors whose vmstat worker threads have been disabled */ > for_each_cpu(cpu, cpu_stat_off) > - if (need_update(cpu) && > - cpumask_test_and_clear_cpu(cpu, cpu_stat_off)) > + if (need_update(cpu)) { > > queue_delayed_work_on(cpu, vmstat_wq, > &per_cpu(vmstat_work, cpu), 0); > > + cpumask_clear_cpu(smp_processor_id(), cpu_stat_off); > + } > + > put_online_cpus(); > > schedule_delayed_work(&shepherd, This can alternatively lead to following where vmstat may not be scheduled for cpu when it is back from idle. CPU0: CPU1: vmstat_shepherd <enter idle> queue_delayed_work_on(CPU0) quiet_vmstat cancel_delayed_work cpumask_test_and_set_cpu (0->1) cpumask_clear_cpu(CPU0) (1->0) -- regards Shiraz Hashim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-21 17:38 ` Christoph Lameter @ 2016-01-22 14:04 ` Michal Hocko -1 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-22 14:04 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Thu 21-01-16 11:38:46, Christoph Lameter wrote: > On Thu, 21 Jan 2016, Michal Hocko wrote: > > > It goes like this: > > CPU0: CPU1 > > vmstat_update > > cpumask_test_and_set_cpu (0->1) > > [...] > > vmstat_shepherd > > <enter idle> cpumask_test_and_clear_cpu(CPU0) (1->0) > > quiet_vmstat > > cpumask_test_and_set_cpu (0->1) > > queue_delayed_work_on(CPU0) > > refresh_cpu_vm_stats() > > [...] > > vmstat_update > > nothing_to_do > > cpumask_test_and_set_cpu (1->1) > > VM_BUG_ON > > > > Or am I missing something? > > Ok then the following should fix it: Wouldn't it be much more easier and simply get rid of the VM_BUG_ON? What is the point of keeping it in the first place. The code can perfectly cope with the race. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-22 14:04 ` Michal Hocko 0 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-22 14:04 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Thu 21-01-16 11:38:46, Christoph Lameter wrote: > On Thu, 21 Jan 2016, Michal Hocko wrote: > > > It goes like this: > > CPU0: CPU1 > > vmstat_update > > cpumask_test_and_set_cpu (0->1) > > [...] > > vmstat_shepherd > > <enter idle> cpumask_test_and_clear_cpu(CPU0) (1->0) > > quiet_vmstat > > cpumask_test_and_set_cpu (0->1) > > queue_delayed_work_on(CPU0) > > refresh_cpu_vm_stats() > > [...] > > vmstat_update > > nothing_to_do > > cpumask_test_and_set_cpu (1->1) > > VM_BUG_ON > > > > Or am I missing something? > > Ok then the following should fix it: Wouldn't it be much more easier and simply get rid of the VM_BUG_ON? What is the point of keeping it in the first place. The code can perfectly cope with the race. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-22 14:04 ` Michal Hocko @ 2016-01-22 16:07 ` Christoph Lameter -1 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-22 16:07 UTC (permalink / raw) To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Fri, 22 Jan 2016, Michal Hocko wrote: > Wouldn't it be much more easier and simply get rid of the VM_BUG_ON? > What is the point of keeping it in the first place. The code can > perfectly cope with the race. Ok then lets do that. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-22 16:07 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-22 16:07 UTC (permalink / raw) To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Fri, 22 Jan 2016, Michal Hocko wrote: > Wouldn't it be much more easier and simply get rid of the VM_BUG_ON? > What is the point of keeping it in the first place. The code can > perfectly cope with the race. Ok then lets do that. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-22 16:07 ` Christoph Lameter @ 2016-01-22 16:12 ` Michal Hocko -1 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-22 16:12 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Fri 22-01-16 10:07:01, Christoph Lameter wrote: > On Fri, 22 Jan 2016, Michal Hocko wrote: > > > Wouldn't it be much more easier and simply get rid of the VM_BUG_ON? > > What is the point of keeping it in the first place. The code can > > perfectly cope with the race. > > Ok then lets do that. Could you repost the patch with the updated description? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-22 16:12 ` Michal Hocko 0 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-22 16:12 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Fri 22-01-16 10:07:01, Christoph Lameter wrote: > On Fri, 22 Jan 2016, Michal Hocko wrote: > > > Wouldn't it be much more easier and simply get rid of the VM_BUG_ON? > > What is the point of keeping it in the first place. The code can > > perfectly cope with the race. > > Ok then lets do that. Could you repost the patch with the updated description? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-22 16:12 ` Michal Hocko @ 2016-01-22 16:46 ` Christoph Lameter -1 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-22 16:46 UTC (permalink / raw) To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Fri, 22 Jan 2016, Michal Hocko wrote: > Could you repost the patch with the updated description? Subject: vmstat: Remove BUG_ON from vmstat_update If we detect that there is nothing to do just set the flag and do not check if it was already set before. Races really do not matter. If the flag is set by any code then the shepherd will start dealing with the situation and reenable the vmstat workers when necessary again. Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle") quiet_vmstat might update cpu_stat_off and mark a particular cpu to be handled by vmstat_shepherd. This might trigger a VM_BUG_ON in vmstat_update because the work item might have been sleeping during the idle period and see the cpu_stat_off updated after the wake up. The VM_BUG_ON is therefore misleading and no more appropriate. Moreover it doesn't really suite any protection from real bugs because vmstat_shepherd will simply reschedule the vmstat_work anytime it sees a particular cpu set or vmstat_update would do the same from the worker context directly. Even when the two would race the result wouldn't be incorrect as the counters update is fully idempotent. Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/mm/vmstat.c =================================================================== --- linux.orig/mm/vmstat.c +++ linux/mm/vmstat.c @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st * Defer the checking for differentials to the * shepherd thread on a different processor. */ - int r; - /* - * Shepherd work thread does not race since it never - * changes the bit if its zero but the cpu - * online / off line code may race if - * worker threads are still allowed during - * shutdown / startup. - */ - r = cpumask_test_and_set_cpu(smp_processor_id(), - cpu_stat_off); - VM_BUG_ON(r); + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); } } ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-22 16:46 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-22 16:46 UTC (permalink / raw) To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Fri, 22 Jan 2016, Michal Hocko wrote: > Could you repost the patch with the updated description? Subject: vmstat: Remove BUG_ON from vmstat_update If we detect that there is nothing to do just set the flag and do not check if it was already set before. Races really do not matter. If the flag is set by any code then the shepherd will start dealing with the situation and reenable the vmstat workers when necessary again. Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle") quiet_vmstat might update cpu_stat_off and mark a particular cpu to be handled by vmstat_shepherd. This might trigger a VM_BUG_ON in vmstat_update because the work item might have been sleeping during the idle period and see the cpu_stat_off updated after the wake up. The VM_BUG_ON is therefore misleading and no more appropriate. Moreover it doesn't really suite any protection from real bugs because vmstat_shepherd will simply reschedule the vmstat_work anytime it sees a particular cpu set or vmstat_update would do the same from the worker context directly. Even when the two would race the result wouldn't be incorrect as the counters update is fully idempotent. Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/mm/vmstat.c =================================================================== --- linux.orig/mm/vmstat.c +++ linux/mm/vmstat.c @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st * Defer the checking for differentials to the * shepherd thread on a different processor. */ - int r; - /* - * Shepherd work thread does not race since it never - * changes the bit if its zero but the cpu - * online / off line code may race if - * worker threads are still allowed during - * shutdown / startup. - */ - r = cpumask_test_and_set_cpu(smp_processor_id(), - cpu_stat_off); - VM_BUG_ON(r); + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); } } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-22 16:46 ` Christoph Lameter @ 2016-01-22 17:12 ` Michal Hocko -1 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-22 17:12 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Fri 22-01-16 10:46:14, Christoph Lameter wrote: > On Fri, 22 Jan 2016, Michal Hocko wrote: > > > Could you repost the patch with the updated description? > > Subject: vmstat: Remove BUG_ON from vmstat_update > > If we detect that there is nothing to do just set the flag and do not check > if it was already set before. Races really do not matter. If the flag is > set by any code then the shepherd will start dealing with the situation > and reenable the vmstat workers when necessary again. > > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a > particular cpu to be handled by vmstat_shepherd. This might trigger > a VM_BUG_ON in vmstat_update because the work item might have been > sleeping during the idle period and see the cpu_stat_off updated after > the wake up. The VM_BUG_ON is therefore misleading and no more > appropriate. Moreover it doesn't really suite any protection from real > bugs because vmstat_shepherd will simply reschedule the vmstat_work > anytime it sees a particular cpu set or vmstat_update would do the same > from the worker context directly. Even when the two would race the > result wouldn't be incorrect as the counters update is fully idempotent. > Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle" Reported-by: Sasha Levin <sasha.levin@oracle.com> Would be appropriate IMO > Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > > Index: linux/mm/vmstat.c > =================================================================== > --- linux.orig/mm/vmstat.c > +++ linux/mm/vmstat.c > @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st > * Defer the checking for differentials to the > * shepherd thread on a different processor. > */ > - int r; > - /* > - * Shepherd work thread does not race since it never > - * changes the bit if its zero but the cpu > - * online / off line code may race if > - * worker threads are still allowed during > - * shutdown / startup. > - */ > - r = cpumask_test_and_set_cpu(smp_processor_id(), > - cpu_stat_off); > - VM_BUG_ON(r); > + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); > } > } -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-22 17:12 ` Michal Hocko 0 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-22 17:12 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Fri 22-01-16 10:46:14, Christoph Lameter wrote: > On Fri, 22 Jan 2016, Michal Hocko wrote: > > > Could you repost the patch with the updated description? > > Subject: vmstat: Remove BUG_ON from vmstat_update > > If we detect that there is nothing to do just set the flag and do not check > if it was already set before. Races really do not matter. If the flag is > set by any code then the shepherd will start dealing with the situation > and reenable the vmstat workers when necessary again. > > Since 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and > shut down on idle") quiet_vmstat might update cpu_stat_off and mark a > particular cpu to be handled by vmstat_shepherd. This might trigger > a VM_BUG_ON in vmstat_update because the work item might have been > sleeping during the idle period and see the cpu_stat_off updated after > the wake up. The VM_BUG_ON is therefore misleading and no more > appropriate. Moreover it doesn't really suite any protection from real > bugs because vmstat_shepherd will simply reschedule the vmstat_work > anytime it sees a particular cpu set or vmstat_update would do the same > from the worker context directly. Even when the two would race the > result wouldn't be incorrect as the counters update is fully idempotent. > Fixes: 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle" Reported-by: Sasha Levin <sasha.levin@oracle.com> Would be appropriate IMO > Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > > Index: linux/mm/vmstat.c > =================================================================== > --- linux.orig/mm/vmstat.c > +++ linux/mm/vmstat.c > @@ -1408,17 +1408,7 @@ static void vmstat_update(struct work_st > * Defer the checking for differentials to the > * shepherd thread on a different processor. > */ > - int r; > - /* > - * Shepherd work thread does not race since it never > - * changes the bit if its zero but the cpu > - * online / off line code may race if > - * worker threads are still allowed during > - * shutdown / startup. > - */ > - r = cpumask_test_and_set_cpu(smp_processor_id(), > - cpu_stat_off); > - VM_BUG_ON(r); > + cpumask_set_cpu(smp_processor_id(), cpu_stat_off); > } > } -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-22 16:46 ` Christoph Lameter (?) (?) @ 2016-01-23 16:21 ` Mike Galbraith 2016-01-24 0:33 ` Christoph Lameter ` (2 more replies) -1 siblings, 3 replies; 114+ messages in thread From: Mike Galbraith @ 2016-01-23 16:21 UTC (permalink / raw) To: Christoph Lameter; +Cc: Peter Zijlstra, LKML Hi Christoph, While you're fixing that commit up, can you perhaps find a better home for quiet_vmstat()? It not only munches cycles when switching cross -core mightily, for -rt it injects a sleeping lock into the idle task. 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 4.75% [kernel] [k] __schedule 4.70% [kernel] [k] mutex_unlock 3.14% [kernel] [k] __switch_to -Mike ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-23 16:21 ` fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) Mike Galbraith @ 2016-01-24 0:33 ` Christoph Lameter 2016-01-24 2:46 ` Mike Galbraith 2016-01-25 17:42 ` Michal Hocko 2016-01-27 16:48 ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) " Michal Hocko 2 siblings, 1 reply; 114+ messages in thread From: Christoph Lameter @ 2016-01-24 0:33 UTC (permalink / raw) To: Mike Galbraith; +Cc: Peter Zijlstra, LKML On Sat, 23 Jan 2016, Mike Galbraith wrote: > While you're fixing that commit up, can you perhaps find a better home > for quiet_vmstat()? It not only munches cycles when switching cross > -core mightily, for -rt it injects a sleeping lock into the idle task. Not sure what you are talking about. No sleeping locks are used in quiet_vmstat() nor does it switch across cores. It would be broken if it would do so. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-24 0:33 ` Christoph Lameter @ 2016-01-24 2:46 ` Mike Galbraith 2016-01-24 3:46 ` Christoph Lameter 0 siblings, 1 reply; 114+ messages in thread From: Mike Galbraith @ 2016-01-24 2:46 UTC (permalink / raw) To: Christoph Lameter; +Cc: Peter Zijlstra, LKML On Sat, 2016-01-23 at 18:33 -0600, Christoph Lameter wrote: > On Sat, 23 Jan 2016, Mike Galbraith wrote: > > > While you're fixing that commit up, can you perhaps find a better home > > for quiet_vmstat()? It not only munches cycles when switching cross > > -core mightily, for -rt it injects a sleeping lock into the idle task. > > Not sure what you are talking about. No sleeping locks are used in > quiet_vmstat() nor does it switch across cores. It would be broken if it > would do so. By switching cross-core, I'm referring to scheduling of communicating tasks. The perf top snippet... 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 4.75% [kernel] [k] __schedule 4.70% [kernel] [k] mutex_unlock 3.14% [kernel] [k] __switch_to ... was pipe-test, an unrealistic microbench, but the same will happen at lower frequency in the real world. Here's the sleeping lock for -rt: [ 2.279582] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7 [ 2.280444] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 [ 2.281316] ffff88040b00d640 ffff88040b01fe10 ffffffff812d20e2 0000000000000000 [ 2.282202] ffff88040b01fe30 ffffffff81081095 ffff88041ec4cee0 ffff88041ec501e0 [ 2.283073] ffff88040b01fe48 ffffffff815ff910 ffff88041ec4cee0 ffff88040b01fe88 [ 2.283941] Call Trace: [ 2.284797] [<ffffffff812d20e2>] dump_stack+0x49/0x67 [ 2.285658] [<ffffffff81081095>] ___might_sleep+0xf5/0x180 [ 2.286521] [<ffffffff815ff910>] rt_spin_lock+0x20/0x50 [ 2.287382] [<ffffffff81075919>] try_to_grab_pending+0x69/0x240 [ 2.288239] [<ffffffff81075b16>] cancel_delayed_work+0x26/0xe0 [ 2.289094] [<ffffffff8115ec05>] quiet_vmstat+0x75/0xa0 [ 2.289949] [<ffffffff8109ab38>] cpu_idle_loop+0x38/0x3e0 [ 2.290800] [<ffffffff8109aef3>] cpu_startup_entry+0x13/0x20 [ 2.291647] [<ffffffff81036164>] start_secondary+0x114/0x140 ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-24 2:46 ` Mike Galbraith @ 2016-01-24 3:46 ` Christoph Lameter 2016-01-24 5:36 ` Mike Galbraith 0 siblings, 1 reply; 114+ messages in thread From: Christoph Lameter @ 2016-01-24 3:46 UTC (permalink / raw) To: Mike Galbraith; +Cc: Peter Zijlstra, LKML On Sun, 24 Jan 2016, Mike Galbraith wrote: > By switching cross-core, I'm referring to scheduling of communicating > tasks. ??? Its cancelling a work request. That is a "communicating task"? > Here's the sleeping lock for -rt: > > [ 2.279582] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7 > [ 2.280444] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 > [ 2.281316] ffff88040b00d640 ffff88040b01fe10 ffffffff812d20e2 0000000000000000 > [ 2.282202] ffff88040b01fe30 ffffffff81081095 ffff88041ec4cee0 ffff88041ec501e0 > [ 2.283073] ffff88040b01fe48 ffffffff815ff910 ffff88041ec4cee0 ffff88040b01fe88 > [ 2.283941] Call Trace: > [ 2.284797] [<ffffffff812d20e2>] dump_stack+0x49/0x67 > [ 2.285658] [<ffffffff81081095>] ___might_sleep+0xf5/0x180 > [ 2.286521] [<ffffffff815ff910>] rt_spin_lock+0x20/0x50 > [ 2.287382] [<ffffffff81075919>] try_to_grab_pending+0x69/0x240 > [ 2.288239] [<ffffffff81075b16>] cancel_delayed_work+0x26/0xe0 OMG cancelling a work request causes a sleeping lock to be taken? ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-24 3:46 ` Christoph Lameter @ 2016-01-24 5:36 ` Mike Galbraith 0 siblings, 0 replies; 114+ messages in thread From: Mike Galbraith @ 2016-01-24 5:36 UTC (permalink / raw) To: Christoph Lameter; +Cc: Peter Zijlstra, LKML On Sat, 2016-01-23 at 21:46 -0600, Christoph Lameter wrote: > On Sun, 24 Jan 2016, Mike Galbraith wrote: > > > By switching cross-core, I'm referring to scheduling of communicating > > tasks. > > ??? Its cancelling a work request. That is a "communicating task"? No no no, pipe-test is two tasks playing ping-pong via a pipe. They are about as synchronous as it gets, so each task goes idle at high frequency. Idle is fastpath. > > Here's the sleeping lock for -rt: > > > > [ 2.279582] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7 > > [ 2.280444] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 > > [ 2.281316] ffff88040b00d640 ffff88040b01fe10 ffffffff812d20e2 0000000000000000 > > [ 2.282202] ffff88040b01fe30 ffffffff81081095 ffff88041ec4cee0 ffff88041ec501e0 > > [ 2.283073] ffff88040b01fe48 ffffffff815ff910 ffff88041ec4cee0 ffff88040b01fe88 > > [ 2.283941] Call Trace: > > [ 2.284797] [] dump_stack+0x49/0x67 > > [ 2.285658] [] ___might_sleep+0xf5/0x180 > > [ 2.286521] [] rt_spin_lock+0x20/0x50 > > [ 2.287382] [] try_to_grab_pending+0x69/0x240 > > [ 2.288239] [] cancel_delayed_work+0x26/0xe0 > > OMG cancelling a work request causes a sleeping lock to be taken? Yup. In -rt, spinlocks that are not raw are transformed into rtmutex. -Mike ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-23 16:21 ` fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) Mike Galbraith 2016-01-24 0:33 ` Christoph Lameter @ 2016-01-25 17:42 ` Michal Hocko 2016-01-25 18:02 ` Christoph Lameter 2016-01-27 16:48 ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) " Michal Hocko 2 siblings, 1 reply; 114+ messages in thread From: Michal Hocko @ 2016-01-25 17:42 UTC (permalink / raw) To: Christoph Lameter; +Cc: Mike Galbraith, Peter Zijlstra, LKML On Sat 23-01-16 17:21:55, Mike Galbraith wrote: > Hi Christoph, > > While you're fixing that commit up, can you perhaps find a better home > for quiet_vmstat()? It not only munches cycles when switching cross > -core mightily, for -rt it injects a sleeping lock into the idle task. > > 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 > 4.75% [kernel] [k] __schedule > 4.70% [kernel] [k] mutex_unlock > 3.14% [kernel] [k] __switch_to Hmm, I wouldn't have expected that refresh_cpu_vm_stats could have such a large footprint. I guess this would be just an expensive noop because we have to check all the zones*counters and do an expensive this_cpu_xchg. Is the whole deferred thing worth this overhead? 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle") doesn't talk about any numbers and neither does 39bf6270f524 ("VM statistics: Make timer deferrable"). But even when refresh_cpu_vm_stats is made more effective there would still be the problem for RT and the work canceling, though. We would need much more changes to make this RT ready (both timer and WQ code would need some changes AFAIU). Unless there is a clear and huge win from doing the vmstat update deferrable then I think a revert is more appropriate IMHO. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-25 17:42 ` Michal Hocko @ 2016-01-25 18:02 ` Christoph Lameter 2016-01-25 20:13 ` Michal Hocko 2016-01-26 2:14 ` Mike Galbraith 0 siblings, 2 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-25 18:02 UTC (permalink / raw) To: Michal Hocko; +Cc: Mike Galbraith, Peter Zijlstra, LKML On Mon, 25 Jan 2016, Michal Hocko wrote: > On Sat 23-01-16 17:21:55, Mike Galbraith wrote: > > Hi Christoph, > > > > While you're fixing that commit up, can you perhaps find a better home > > for quiet_vmstat()? It not only munches cycles when switching cross > > -core mightily, for -rt it injects a sleeping lock into the idle task. > > > > 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 > > 4.75% [kernel] [k] __schedule > > 4.70% [kernel] [k] mutex_unlock > > 3.14% [kernel] [k] __switch_to > > Hmm, I wouldn't have expected that refresh_cpu_vm_stats could have > such a large footprint. I guess this would be just an expensive noop > because we have to check all the zones*counters and do an expensive > this_cpu_xchg. Is the whole deferred thing worth this overhead? Why would the deferring cause this overhead? Also there is no cross core activity from quiet_vmstat(). It simply disables the local vmstat updates. > Unless there is a clear and huge win from doing the vmstat update > deferrable then I think a revert is more appropriate IMHO. It reduces the OS events that the application experiences by folding it into the tick events. If its not deferrable then a timer event will be generated in addition to the tick. We do not want that. Workqueues are used in many places. If RT can sleep within workqueue management functions then spinlocks cannot be taken anymore and there may be issues with preemption. The regression that I know of (independent of "RT") is due as far as I know due to the switch of the parameters of some vmstat functions to 64 bit instead of 32 bit. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-25 18:02 ` Christoph Lameter @ 2016-01-25 20:13 ` Michal Hocko 2016-01-26 16:25 ` Christoph Lameter 2016-01-26 2:14 ` Mike Galbraith 1 sibling, 1 reply; 114+ messages in thread From: Michal Hocko @ 2016-01-25 20:13 UTC (permalink / raw) To: Christoph Lameter; +Cc: Mike Galbraith, Peter Zijlstra, LKML On Mon 25-01-16 12:02:06, Christoph Lameter wrote: > On Mon, 25 Jan 2016, Michal Hocko wrote: > > > On Sat 23-01-16 17:21:55, Mike Galbraith wrote: > > > Hi Christoph, > > > > > > While you're fixing that commit up, can you perhaps find a better home > > > for quiet_vmstat()? It not only munches cycles when switching cross > > > -core mightily, for -rt it injects a sleeping lock into the idle task. > > > > > > 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 > > > 4.75% [kernel] [k] __schedule > > > 4.70% [kernel] [k] mutex_unlock > > > 3.14% [kernel] [k] __switch_to > > > > Hmm, I wouldn't have expected that refresh_cpu_vm_stats could have > > such a large footprint. I guess this would be just an expensive noop > > because we have to check all the zones*counters and do an expensive > > this_cpu_xchg. Is the whole deferred thing worth this overhead? > > Why would the deferring cause this overhead? I guess the profile speaks for itself, doesn't it? > Also there is no cross core activity from quiet_vmstat(). It simply > disables the local vmstat updates. It doesn't go cross core but it still does nr_zones * counters atomic ops. > > Unless there is a clear and huge win from doing the vmstat update > > deferrable then I think a revert is more appropriate IMHO. > > It reduces the OS events that the application experiences by folding it > into the tick events. If its not deferrable then a timer event will be > generated in addition to the tick. We do not want that. Yes this is what I have read in the changelog. But "how much" part is really missing. Is this even quantifiable? > Workqueues are used in many places. If RT can sleep within workqueue > management functions then spinlocks cannot be taken anymore and there may > be issues with preemption. RT can sleep in _any_ spinlock except for raw spin locks. Even though the !RT kernel is not sleeping doesn't really matter much because cancel_delayed_work is quite a heavy function which shouldn't be called from the idle context AFAIU. Sure most of the time it will boil down to del_timer but it can hit the slowpath as well if the timer got migrated to a different CPU and we have to race with the WQ pool management IIUC. Maybe this overhead can be reduced by outsourcing the functionality to vmstat_shepherd which can check idle CPUs, cancel the timer for them update the differentials and put them to cpu_stat_off? > The regression that I know of (independent of "RT") is due as far as I > know due to the switch of the parameters of some vmstat functions to 64 > bit instead of 32 bit. I am not sure I am following. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-25 20:13 ` Michal Hocko @ 2016-01-26 16:25 ` Christoph Lameter 2016-01-26 18:31 ` Mike Galbraith 0 siblings, 1 reply; 114+ messages in thread From: Christoph Lameter @ 2016-01-26 16:25 UTC (permalink / raw) To: Michal Hocko; +Cc: Mike Galbraith, Peter Zijlstra, LKML On Mon, 25 Jan 2016, Michal Hocko wrote: > > Why would the deferring cause this overhead? > > I guess the profile speaks for itself, doesn't it? But the system is going idle? Why would this impact performance? > > Also there is no cross core activity from quiet_vmstat(). It simply > > disables the local vmstat updates. > > It doesn't go cross core but it still does nr_zones * counters atomic > ops. If there are updates then yes. > > It reduces the OS events that the application experiences by folding it > > into the tick events. If its not deferrable then a timer event will be > > generated in addition to the tick. We do not want that. > > Yes this is what I have read in the changelog. But "how much" part is > really missing. Is this even quantifiable? Oh yes. If you want to have low latency responses yes. And event like that causes a couple of microseconds delay which will cause a moneytary impact if you have to react to stock trading events. > Maybe this overhead can be reduced by outsourcing the functionality to > vmstat_shepherd which can check idle CPUs, cancel the timer for them > update the differentials and put them to cpu_stat_off? Remote updating of differentials is problematic due to the counters being per cpu values that are expected to only be updated from the cpu that "owns" them. > > The regression that I know of (independent of "RT") is due as far as I > > know due to the switch of the parameters of some vmstat functions to 64 > > bit instead of 32 bit. > > I am not sure I am following. An additional patch was merged for 4.5 that increases the arguments to the counter operations to 64 bit which is known for regressions. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-26 16:25 ` Christoph Lameter @ 2016-01-26 18:31 ` Mike Galbraith 2016-01-26 18:34 ` Christoph Lameter 0 siblings, 1 reply; 114+ messages in thread From: Mike Galbraith @ 2016-01-26 18:31 UTC (permalink / raw) To: Christoph Lameter, Michal Hocko; +Cc: Peter Zijlstra, LKML On Tue, 2016-01-26 at 10:25 -0600, Christoph Lameter wrote: > On Mon, 25 Jan 2016, Michal Hocko wrote: > > > > Why would the deferring cause this overhead? > > > > I guess the profile speaks for itself, doesn't it? > > But the system is going idle? Why would this impact performance? We enter/exit idle a lot. Your reluctance to move it seem to suggest that 99.99% of CPUs on the planet chewing up cycles (measured) doing what for most is useless work on every micro-idle is a perfectly fine price to pay to ensure that .01% (or whatever tiny minority) get what they want. I disagree. You're burning electrons for no benefit at all to me on my box. You want to do high speed trading, that's fine, but I expect my box to be able to pop in and out of idle without having to pay a toll to the high speed trading bandits of the world, thank you very much. This specialty thing does not belong in the generic fast path. -Mike ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-26 18:31 ` Mike Galbraith @ 2016-01-26 18:34 ` Christoph Lameter 2016-01-26 18:45 ` Mike Galbraith 0 siblings, 1 reply; 114+ messages in thread From: Christoph Lameter @ 2016-01-26 18:34 UTC (permalink / raw) To: Mike Galbraith; +Cc: Michal Hocko, Peter Zijlstra, LKML On Tue, 26 Jan 2016, Mike Galbraith wrote: > I disagree. You're burning electrons for no benefit at all to me on my > box. You want to do high speed trading, that's fine, but I expect my > box to be able to pop in and out of idle without having to pay a toll > to the high speed trading bandits of the world, thank you very much. > > This specialty thing does not belong in the generic fast path. The system going idle is a fastpath. Mind boogling. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-26 18:34 ` Christoph Lameter @ 2016-01-26 18:45 ` Mike Galbraith 2016-01-26 19:20 ` Christoph Lameter 0 siblings, 1 reply; 114+ messages in thread From: Mike Galbraith @ 2016-01-26 18:45 UTC (permalink / raw) To: Christoph Lameter; +Cc: Michal Hocko, Peter Zijlstra, LKML On Tue, 2016-01-26 at 12:34 -0600, Christoph Lameter wrote: > On Tue, 26 Jan 2016, Mike Galbraith wrote: > > > I disagree. You're burning electrons for no benefit at all to me on my > > box. You want to do high speed trading, that's fine, but I expect my > > box to be able to pop in and out of idle without having to pay a toll > > to the high speed trading bandits of the world, thank you very much. > > > > This specialty thing does not belong in the generic fast path. > > The system going idle is a fastpath. Mind boogling. Hohum, noted. Now what about those cycles, and the sleeping lock you injected for -rt? -Mike ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-26 18:45 ` Mike Galbraith @ 2016-01-26 19:20 ` Christoph Lameter 2016-01-27 3:12 ` Mike Galbraith 0 siblings, 1 reply; 114+ messages in thread From: Christoph Lameter @ 2016-01-26 19:20 UTC (permalink / raw) To: Mike Galbraith; +Cc: Michal Hocko, Peter Zijlstra, LKML On Tue, 26 Jan 2016, Mike Galbraith wrote: > > The system going idle is a fastpath. Mind boogling. > > Hohum, noted. Now what about those cycles, and the sleeping lock you > injected for -rt? Since we (the NOHZ people) care mostly about NOHZ then lets restrict that to the NOHZ mode. Then it should not affect your load. Subject: Move quiet_vmstat() to NOHZ code quiet_vmstat() seems to cause regressions for some load because the cpu going idle is a "fastpath". Mind boogling. Strange claim. If the system goes idle then it has nothing to do after all But anyways if we shift the quiet_vmstat() into the NOHZ logic when it stops the tick then it will only affect those cores that are setup for NOHZ mode. That is where we want this processing after all to ensure that the OS keeps itself off those cores. Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/kernel/sched/idle.c =================================================================== --- linux.orig/kernel/sched/idle.c +++ linux/kernel/sched/idle.c @@ -213,7 +213,6 @@ static void cpu_idle_loop(void) */ __current_set_polling(); - quiet_vmstat(); tick_nohz_idle_enter(); while (!need_resched()) { Index: linux/kernel/time/tick-sched.c =================================================================== --- linux.orig/kernel/time/tick-sched.c +++ linux/kernel/time/tick-sched.c @@ -811,6 +811,7 @@ static void __tick_nohz_idle_enter(struc ts->idle_calls++; expires = tick_nohz_stop_sched_tick(ts, now, cpu); + quiet_vmstat(); if (expires.tv64 > 0LL) { ts->idle_sleeps++; ts->idle_expires = expires; ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-26 19:20 ` Christoph Lameter @ 2016-01-27 3:12 ` Mike Galbraith 2016-01-27 4:15 ` Mike Galbraith 2016-01-27 16:28 ` Christoph Lameter 0 siblings, 2 replies; 114+ messages in thread From: Mike Galbraith @ 2016-01-27 3:12 UTC (permalink / raw) To: Christoph Lameter; +Cc: Michal Hocko, Peter Zijlstra, LKML Good morning, On Tue, 2016-01-26 at 13:20 -0600, Christoph Lameter wrote: > On Tue, 26 Jan 2016, Mike Galbraith wrote: > > > > The system going idle is a fastpath. Mind boogling. > > > > Hohum, noted. Now what about those cycles, and the sleeping lock you > > injected for -rt? > > Since we (the NOHZ people) care mostly about NOHZ then lets restrict > that to the NOHZ mode. Then it should not affect your load. Tons of folks do have NO_HZ enabled (including me). Isn't there a spot somewhere in NO_HZ_FULL code where it can take up residence? (one with a tad lower maximum call frequency would be good, a nohz_full cpu isn't necessarily being used for pure compute exclusively) > Subject: Move quiet_vmstat() to NOHZ code > > quiet_vmstat() seems to cause regressions for some load because > the cpu going idle is a "fastpath". Mind boogling. Strange claim. > If the system goes idle then it has nothing to do after all > > But anyways if we shift the quiet_vmstat() into the NOHZ logic > when it stops the tick then it will only affect those cores that > are setup for NOHZ mode. That is where we want this processing > after all to ensure that the OS keeps itself off those cores. > > Signed-off-by: Christoph Lameter <cl@linux.com> > > > Index: linux/kernel/sched/idle.c > =================================================================== > --- linux.orig/kernel/sched/idle.c > +++ linux/kernel/sched/idle.c > @@ -213,7 +213,6 @@ static void cpu_idle_loop(void) > */ > > __current_set_polling(); > - quiet_vmstat(); > tick_nohz_idle_enter(); > > while (!need_resched()) { > Index: linux/kernel/time/tick-sched.c > =================================================================== > --- linux.orig/kernel/time/tick-sched.c > +++ linux/kernel/time/tick-sched.c > @@ -811,6 +811,7 @@ static void __tick_nohz_idle_enter(struc > ts->idle_calls++; > > expires = tick_nohz_stop_sched_tick(ts, now, cpu); > + quiet_vmstat(); > if (expires.tv64 > 0LL) { > ts->idle_sleeps++; > ts->idle_expires = expires; ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-27 3:12 ` Mike Galbraith @ 2016-01-27 4:15 ` Mike Galbraith 2016-01-27 16:28 ` Christoph Lameter 1 sibling, 0 replies; 114+ messages in thread From: Mike Galbraith @ 2016-01-27 4:15 UTC (permalink / raw) To: Christoph Lameter; +Cc: Michal Hocko, Peter Zijlstra, LKML On Wed, 2016-01-27 at 04:12 +0100, Mike Galbraith wrote: > Good morning, > > On Tue, 2016-01-26 at 13:20 -0600, Christoph Lameter wrote: > > On Tue, 26 Jan 2016, Mike Galbraith wrote: > > > > > > The system going idle is a fastpath. Mind boogling. > > > > > > Hohum, noted. Now what about those cycles, and the sleeping lock you > > > injected for -rt? > > > > Since we (the NOHZ people) care mostly about NOHZ then lets restrict > > that to the NOHZ mode. Then it should not affect your load. > > Tons of folks do have NO_HZ enabled (including me). Isn't there a spot > somewhere in NO_HZ_FULL code where it can take up residence? (one with > a tad lower maximum call frequency would be good, a nohz_full cpu isn't > necessarily being used for pure compute exclusively) I forgot to mention that the spot you picked is called with irqs disabled. > > Subject: Move quiet_vmstat() to NOHZ code > > > > quiet_vmstat() seems to cause regressions for some load because > > the cpu going idle is a "fastpath". Mind boogling. Strange claim. > > If the system goes idle then it has nothing to do after all Hm, seems I also forgot to say "Hohum, noted..." again. -Mike ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-27 3:12 ` Mike Galbraith 2016-01-27 4:15 ` Mike Galbraith @ 2016-01-27 16:28 ` Christoph Lameter 2016-01-28 15:36 ` Frederic Weisbecker 1 sibling, 1 reply; 114+ messages in thread From: Christoph Lameter @ 2016-01-27 16:28 UTC (permalink / raw) To: Frederic Weisbecker; +Cc: Michal Hocko, Mike Galbraith, Peter Zijlstra, LKML On Wed, 27 Jan 2016, Mike Galbraith wrote: > > Since we (the NOHZ people) care mostly about NOHZ then lets restrict > > that to the NOHZ mode. Then it should not affect your load. > > Tons of folks do have NO_HZ enabled (including me). Isn't there a spot > somewhere in NO_HZ_FULL code where it can take up residence? (one with > a tad lower maximum call frequency would be good, a nohz_full cpu isn't > necessarily being used for pure compute exclusively) Frederic, any idea on where to put quiet_vmstat()? > > Subject: Move quiet_vmstat() to NOHZ code > > > > quiet_vmstat() seems to cause regressions for some load because > > the cpu going idle is a "fastpath". Mind boogling. Strange claim. > > If the system goes idle then it has nothing to do after all > > > > But anyways if we shift the quiet_vmstat() into the NOHZ logic > > when it stops the tick then it will only affect those cores that > > are setup for NOHZ mode. That is where we want this processing > > after all to ensure that the OS keeps itself off those cores. > > > > Signed-off-by: Christoph Lameter <cl@linux.com> > > > > > > Index: linux/kernel/sched/idle.c > > =================================================================== > > --- linux.orig/kernel/sched/idle.c > > +++ linux/kernel/sched/idle.c > > @@ -213,7 +213,6 @@ static void cpu_idle_loop(void) > > */ > > > > __current_set_polling(); > > - quiet_vmstat(); > > tick_nohz_idle_enter(); > > > > while (!need_resched()) { > > Index: linux/kernel/time/tick-sched.c > > =================================================================== > > --- linux.orig/kernel/time/tick-sched.c > > +++ linux/kernel/time/tick-sched.c > > @@ -811,6 +811,7 @@ static void __tick_nohz_idle_enter(struc > > ts->idle_calls++; > > > > expires = tick_nohz_stop_sched_tick(ts, now, cpu); > > + quiet_vmstat(); > > if (expires.tv64 > 0LL) { > > ts->idle_sleeps++; > > ts->idle_expires = expires; > ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-27 16:28 ` Christoph Lameter @ 2016-01-28 15:36 ` Frederic Weisbecker 2016-01-28 16:42 ` Christoph Lameter 0 siblings, 1 reply; 114+ messages in thread From: Frederic Weisbecker @ 2016-01-28 15:36 UTC (permalink / raw) To: Christoph Lameter; +Cc: Michal Hocko, Mike Galbraith, Peter Zijlstra, LKML On Wed, Jan 27, 2016 at 10:28:48AM -0600, Christoph Lameter wrote: > On Wed, 27 Jan 2016, Mike Galbraith wrote: > > > > Since we (the NOHZ people) care mostly about NOHZ then lets restrict > > > that to the NOHZ mode. Then it should not affect your load. > > > > Tons of folks do have NO_HZ enabled (including me). Isn't there a spot > > somewhere in NO_HZ_FULL code where it can take up residence? (one with > > a tad lower maximum call frequency would be good, a nohz_full cpu isn't > > necessarily being used for pure compute exclusively) > > Frederic, any idea on where to put quiet_vmstat()? I guess those who want this also want task isolation, right? And Chris integrates it in his patchset. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-28 15:36 ` Frederic Weisbecker @ 2016-01-28 16:42 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-28 16:42 UTC (permalink / raw) To: Frederic Weisbecker; +Cc: Michal Hocko, Mike Galbraith, Peter Zijlstra, LKML On Thu, 28 Jan 2016, Frederic Weisbecker wrote: > > Frederic, any idea on where to put quiet_vmstat()? > > I guess those who want this also want task isolation, right? And Chris integrates > it in his patchset. Ok. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-25 18:02 ` Christoph Lameter 2016-01-25 20:13 ` Michal Hocko @ 2016-01-26 2:14 ` Mike Galbraith 2016-01-26 2:25 ` Mike Galbraith 2016-01-26 16:26 ` Christoph Lameter 1 sibling, 2 replies; 114+ messages in thread From: Mike Galbraith @ 2016-01-26 2:14 UTC (permalink / raw) To: Christoph Lameter, Michal Hocko; +Cc: Peter Zijlstra, LKML On Mon, 2016-01-25 at 12:02 -0600, Christoph Lameter wrote: > On Mon, 25 Jan 2016, Michal Hocko wrote: > > > On Sat 23-01-16 17:21:55, Mike Galbraith wrote: > > > Hi Christoph, > > > > > > While you're fixing that commit up, can you perhaps find a better home > > > for quiet_vmstat()? It not only munches cycles when switching cross > > > -core mightily, for -rt it injects a sleeping lock into the idle task. > > > > > > 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 > > > 4.75% [kernel] [k] __schedule > > > 4.70% [kernel] [k] mutex_unlock > > > 3.14% [kernel] [k] __switch_to > > > > Hmm, I wouldn't have expected that refresh_cpu_vm_stats could have > > such a large footprint. I guess this would be just an expensive noop > > because we have to check all the zones*counters and do an expensive > > this_cpu_xchg. Is the whole deferred thing worth this overhead? > > Why would the deferring cause this overhead? Because we schedule to idle cores aggressively, thus we may pop in and out of idle at high frequency. > Also there is no cross core activity from quiet_vmstat(). It simply > disables the local vmstat updates. Again, the cross core activity is not due to quiet_vmstat(), it is due to pipe-test threads running on two cores, and meeting quiet_vmstat() at high frequency. > > Unless there is a clear and huge win from doing the vmstat update > > deferrable then I think a revert is more appropriate IMHO. > > It reduces the OS events that the application experiences by folding it > into the tick events. If its not deferrable then a timer event will be > generated in addition to the tick. We do not want that. Perf and RT say we don't want quiet_vmstat() in the idle loop either. -Mike ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-26 2:14 ` Mike Galbraith @ 2016-01-26 2:25 ` Mike Galbraith 2016-01-26 16:26 ` Christoph Lameter 2016-01-26 16:26 ` Christoph Lameter 1 sibling, 1 reply; 114+ messages in thread From: Mike Galbraith @ 2016-01-26 2:25 UTC (permalink / raw) To: Christoph Lameter, Michal Hocko; +Cc: Peter Zijlstra, LKML On Tue, 2016-01-26 at 03:14 +0100, Mike Galbraith wrote: > Perf and RT say we don't want quiet_vmstat() in the idle loop either. BTW, the perf numbers were not from an RT kernel, they were from my PREEMPT_VOLUNTARY desktop kernel. -Mike ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-26 2:25 ` Mike Galbraith @ 2016-01-26 16:26 ` Christoph Lameter 2016-01-26 17:39 ` Mike Galbraith 0 siblings, 1 reply; 114+ messages in thread From: Christoph Lameter @ 2016-01-26 16:26 UTC (permalink / raw) To: Mike Galbraith; +Cc: Michal Hocko, Peter Zijlstra, LKML On Tue, 26 Jan 2016, Mike Galbraith wrote: > On Tue, 2016-01-26 at 03:14 +0100, Mike Galbraith wrote: > > > Perf and RT say we don't want quiet_vmstat() in the idle loop either. > > BTW, the perf numbers were not from an RT kernel, they were from my > PREEMPT_VOLUNTARY desktop kernel. Can we move quiet_vmstat() elsewhere after we have checked that really nothing else is going on soon? ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-26 16:26 ` Christoph Lameter @ 2016-01-26 17:39 ` Mike Galbraith 2016-01-26 18:19 ` Christoph Lameter 0 siblings, 1 reply; 114+ messages in thread From: Mike Galbraith @ 2016-01-26 17:39 UTC (permalink / raw) To: Christoph Lameter; +Cc: Michal Hocko, Peter Zijlstra, LKML On Tue, 2016-01-26 at 10:26 -0600, Christoph Lameter wrote: > On Tue, 26 Jan 2016, Mike Galbraith wrote: > > > On Tue, 2016-01-26 at 03:14 +0100, Mike Galbraith wrote: > > > > > Perf and RT say we don't want quiet_vmstat() in the idle loop > > > either. > > > > BTW, the perf numbers were not from an RT kernel, they were from my > > PREEMPT_VOLUNTARY desktop kernel. > > Can we move quiet_vmstat() elsewhere after we have checked that really > nothing else is going on soon? How would you check? Precognition doesn't work for mortals. -Mike ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-26 17:39 ` Mike Galbraith @ 2016-01-26 18:19 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-26 18:19 UTC (permalink / raw) To: Mike Galbraith; +Cc: Michal Hocko, Peter Zijlstra, LKML On Tue, 26 Jan 2016, Mike Galbraith wrote: > On Tue, 2016-01-26 at 10:26 -0600, Christoph Lameter wrote: > > On Tue, 26 Jan 2016, Mike Galbraith wrote: > > > > > On Tue, 2016-01-26 at 03:14 +0100, Mike Galbraith wrote: > > > > > > > Perf and RT say we don't want quiet_vmstat() in the idle loop > > > > either. > > > > > > BTW, the perf numbers were not from an RT kernel, they were from my > > > PREEMPT_VOLUNTARY desktop kernel. > > > > Can we move quiet_vmstat() elsewhere after we have checked that really > > nothing else is going on soon? > > How would you check? Precognition doesn't work for mortals. Dont we have some decision mechanism to go into higher levels of power savings when the system is idle for longer times? ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-26 2:14 ` Mike Galbraith 2016-01-26 2:25 ` Mike Galbraith @ 2016-01-26 16:26 ` Christoph Lameter 2016-01-26 17:08 ` Mike Galbraith 1 sibling, 1 reply; 114+ messages in thread From: Christoph Lameter @ 2016-01-26 16:26 UTC (permalink / raw) To: Mike Galbraith; +Cc: Michal Hocko, Peter Zijlstra, LKML On Tue, 26 Jan 2016, Mike Galbraith wrote: > > Why would the deferring cause this overhead? > > Because we schedule to idle cores aggressively, thus we may pop in and > out of idle at high frequency. Whats the point of going idle if you have things to do soon? ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-26 16:26 ` Christoph Lameter @ 2016-01-26 17:08 ` Mike Galbraith 2016-01-26 18:22 ` Christoph Lameter 0 siblings, 1 reply; 114+ messages in thread From: Mike Galbraith @ 2016-01-26 17:08 UTC (permalink / raw) To: Christoph Lameter; +Cc: Michal Hocko, Peter Zijlstra, LKML On Tue, 2016-01-26 at 10:26 -0600, Christoph Lameter wrote: > On Tue, 26 Jan 2016, Mike Galbraith wrote: > > > > Why would the deferring cause this overhead? > > > > Because we schedule to idle cores aggressively, thus we may pop in and > > out of idle at high frequency. > > Whats the point of going idle if you have things to do soon? When a task schedules off, how do you know it'll be back at all, much less soon? -Mike ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-26 17:08 ` Mike Galbraith @ 2016-01-26 18:22 ` Christoph Lameter 2016-01-26 19:09 ` Mike Galbraith 0 siblings, 1 reply; 114+ messages in thread From: Christoph Lameter @ 2016-01-26 18:22 UTC (permalink / raw) To: Mike Galbraith; +Cc: Michal Hocko, Peter Zijlstra, LKML On Tue, 26 Jan 2016, Mike Galbraith wrote: > On Tue, 2016-01-26 at 10:26 -0600, Christoph Lameter wrote: > > On Tue, 26 Jan 2016, Mike Galbraith wrote: > > > > > > Why would the deferring cause this overhead? > > > > > > Because we schedule to idle cores aggressively, thus we may pop in and > > > out of idle at high frequency. > > > > Whats the point of going idle if you have things to do soon? > > When a task schedules off, how do you know it'll be back at all, much > less soon? Ok so you are running an artificial benchmark that always gets the system running again when it decides to go idle? ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-26 18:22 ` Christoph Lameter @ 2016-01-26 19:09 ` Mike Galbraith 2016-01-26 19:22 ` Christoph Lameter 0 siblings, 1 reply; 114+ messages in thread From: Mike Galbraith @ 2016-01-26 19:09 UTC (permalink / raw) To: Christoph Lameter; +Cc: Michal Hocko, Peter Zijlstra, LKML On Tue, 2016-01-26 at 12:22 -0600, Christoph Lameter wrote: > On Tue, 26 Jan 2016, Mike Galbraith wrote: > > > On Tue, 2016-01-26 at 10:26 -0600, Christoph Lameter wrote: > > > On Tue, 26 Jan 2016, Mike Galbraith wrote: > > > > > > > > Why would the deferring cause this overhead? > > > > > > > > Because we schedule to idle cores aggressively, thus we may pop > > > > in and > > > > out of idle at high frequency. > > > > > > Whats the point of going idle if you have things to do soon? > > > > When a task schedules off, how do you know it'll be back at all, > > much > > less soon? > > Ok so you are running an artificial benchmark that always gets the > system running again when it decides to go idle? The benchmark does not alter the cycle expenditure per event. The real world will pay the toll less frequently than the artificial benchmark, yes, but it will pay nonetheless, and for most, needlessly. Or? -Mike ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) 2016-01-26 19:09 ` Mike Galbraith @ 2016-01-26 19:22 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-26 19:22 UTC (permalink / raw) To: Mike Galbraith; +Cc: Michal Hocko, Peter Zijlstra, LKML On Tue, 26 Jan 2016, Mike Galbraith wrote: > > Ok so you are running an artificial benchmark that always gets the > > system running again when it decides to go idle? > > The benchmark does not alter the cycle expenditure per event. The real > world will pay the toll less frequently than the artificial benchmark, > yes, but it will pay nonetheless, and for most, needlessly. Or? Well if it does not pay it there then it is going to pay for it later when the vmstat_updater needs to cause a wakeup to fold the data. Also not folding the vmstat updates increases the time that minor updates are deferred and thus impacts the accuracy of the global vm counters. That seems to have been important recently. ^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle) 2016-01-23 16:21 ` fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) Mike Galbraith 2016-01-24 0:33 ` Christoph Lameter 2016-01-25 17:42 ` Michal Hocko @ 2016-01-27 16:48 ` Michal Hocko 2016-01-27 17:03 ` Mike Galbraith 2016-01-27 18:26 ` Christoph Lameter 2 siblings, 2 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-27 16:48 UTC (permalink / raw) To: Mike Galbraith; +Cc: Christoph Lameter, Peter Zijlstra, LKML On Sat 23-01-16 17:21:55, Mike Galbraith wrote: > Hi Christoph, > > While you're fixing that commit up, can you perhaps find a better home > for quiet_vmstat()? It not only munches cycles when switching cross > -core mightily, for -rt it injects a sleeping lock into the idle task. > > 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 > 4.75% [kernel] [k] __schedule > 4.70% [kernel] [k] mutex_unlock > 3.14% [kernel] [k] __switch_to What about the following fix? --- >From c74a04c4fdfe1fa67933bb1ac83d3de0532aaab2 Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@suse.com> Date: Wed, 27 Jan 2016 17:24:22 +0100 Subject: [PATCH] mm, vmstat: make quiet_vmstat lighter Mike has reported a considerable overhead of refresh_cpu_vm_stats from the idle entry during pipe test: 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 4.75% [kernel] [k] __schedule 4.70% [kernel] [k] mutex_unlock 3.14% [kernel] [k] __switch_to This is caused by 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle") which has placed quiet_vmstat into cpu_idle_loop. The main reason here seems to be that the idle entry has to get over all zones and perform atomic operations for each vmstat entry even though there might be no per cpu diffs. This is a pointless overhead for _each_ idle entry. Make sure that quiet_vmstat is as light as possible. First of all it doesn't make any sense to do any local sync if the current cpu is already set in oncpu_stat_off because vmstat_update puts itself there only if there is nothing to do. Then we can check need_update which should be a cheap way to check for potential per-cpu diffs and only then do refresh_cpu_vm_stats. The original patch also did cancel_delayed_work which we are not doing here. There are two reasons for that. Firstly cancel_delayed_work from idle context will blow up on RT kernels (reported by Mike): [ 2.279582] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7 [ 2.280444] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 [ 2.281316] ffff88040b00d640 ffff88040b01fe10 ffffffff812d20e2 0000000000000000 [ 2.282202] ffff88040b01fe30 ffffffff81081095 ffff88041ec4cee0 ffff88041ec501e0 [ 2.283073] ffff88040b01fe48 ffffffff815ff910 ffff88041ec4cee0 ffff88040b01fe88 [ 2.283941] Call Trace: [ 2.284797] [<ffffffff812d20e2>] dump_stack+0x49/0x67 [ 2.285658] [<ffffffff81081095>] ___might_sleep+0xf5/0x180 [ 2.286521] [<ffffffff815ff910>] rt_spin_lock+0x20/0x50 [ 2.287382] [<ffffffff81075919>] try_to_grab_pending+0x69/0x240 [ 2.288239] [<ffffffff81075b16>] cancel_delayed_work+0x26/0xe0 [ 2.289094] [<ffffffff8115ec05>] quiet_vmstat+0x75/0xa0 [ 2.289949] [<ffffffff8109ab38>] cpu_idle_loop+0x38/0x3e0 [ 2.290800] [<ffffffff8109aef3>] cpu_startup_entry+0x13/0x20 [ 2.291647] [<ffffffff81036164>] start_secondary+0x114/0x140 And secondly, even on !RT kernels it might add some non trivial overhead which is not necessary. Even if the vmstat worker wakes up and preempts idle then it will be most likely a single shot noop because the stats were already synced and so it would end up on the oncpu_stat_off anyway. We just need to teach both vmstat_shepherd and vmstat_update to stop scheduling the worker if there is nothing to do. Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Michal Hocko <mhocko@suse.com> --- mm/vmstat.c | 60 ++++++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 40 insertions(+), 20 deletions(-) diff --git a/mm/vmstat.c b/mm/vmstat.c index 40b2c74ddf16..7747f85537b6 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1396,10 +1396,15 @@ static void vmstat_update(struct work_struct *w) * Counters were updated so we expect more updates * to occur in the future. Keep on running the * update worker thread. + * If we were marked on cpu_stat_off clear the flag + * so that vmstat_shepherd doesn't schedule us again. */ - queue_delayed_work_on(smp_processor_id(), vmstat_wq, - this_cpu_ptr(&vmstat_work), - round_jiffies_relative(sysctl_stat_interval)); + if (!cpumask_test_and_clear_cpu(smp_processor_id(), + cpu_stat_off)) { + queue_delayed_work_on(smp_processor_id(), vmstat_wq, + this_cpu_ptr(&vmstat_work), + round_jiffies_relative(sysctl_stat_interval)); + } } else { /* * We did not update any counters so the app may be in @@ -1417,18 +1422,6 @@ static void vmstat_update(struct work_struct *w) * until the diffs stay at zero. The function is used by NOHZ and can only be * invoked when tick processing is not active. */ -void quiet_vmstat(void) -{ - if (system_state != SYSTEM_RUNNING) - return; - - do { - if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) - cancel_delayed_work(this_cpu_ptr(&vmstat_work)); - - } while (refresh_cpu_vm_stats(false)); -} - /* * Check if the diffs for a certain cpu indicate that * an update is needed. @@ -1452,6 +1445,30 @@ static bool need_update(int cpu) return false; } +void quiet_vmstat(void) +{ + if (system_state != SYSTEM_RUNNING) + return; + + /* + * If we are already in hands of the shepherd then there + * is nothing for us to do here. + */ + if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) + return; + + if (!need_update(smp_processor_id())) + return; + + /* + * Just refresh counters and do not care about the pending delayed + * vmstat_update. It doesn't fire that often to matter and canceling + * it would be too expensive from this path. + * vmstat_shepherd will take care about that for us. + */ + refresh_cpu_vm_stats(false); +} + /* * Shepherd worker thread that checks the @@ -1470,11 +1487,14 @@ static void vmstat_shepherd(struct work_struct *w) get_online_cpus(); /* Check processors whose vmstat worker threads have been disabled */ for_each_cpu(cpu, cpu_stat_off) - if (need_update(cpu) && - cpumask_test_and_clear_cpu(cpu, cpu_stat_off)) - - queue_delayed_work_on(cpu, vmstat_wq, - &per_cpu(vmstat_work, cpu), 0); + if (need_update(cpu)) { + if (cpumask_test_and_clear_cpu(cpu, cpu_stat_off)) + queue_delayed_work_on(cpu, vmstat_wq, + &per_cpu(vmstat_work, cpu), 0); + } else { + cpumask_set_cpu(cpu, cpu_stat_off); + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); + } put_online_cpus(); -- 2.7.0.rc3 -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle) 2016-01-27 16:48 ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) " Michal Hocko @ 2016-01-27 17:03 ` Mike Galbraith 2016-01-27 18:26 ` Christoph Lameter 1 sibling, 0 replies; 114+ messages in thread From: Mike Galbraith @ 2016-01-27 17:03 UTC (permalink / raw) To: Michal Hocko; +Cc: Christoph Lameter, Peter Zijlstra, LKML On Wed, 2016-01-27 at 17:48 +0100, Michal Hocko wrote: > On Sat 23-01-16 17:21:55, Mike Galbraith wrote: > > Hi Christoph, > > > > While you're fixing that commit up, can you perhaps find a better > > home > > for quiet_vmstat()? It not only munches cycles when switching > > cross > > -core mightily, for -rt it injects a sleeping lock into the idle > > task. > > > > 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 > > 4.75% [kernel] [k] __schedule > > 4.70% [kernel] [k] mutex_unlock > > 3.14% [kernel] [k] __switch_to > > What about the following fix? I haven't done _extensive_ testing, but 4.5-rc1-rt (with NO_HZ_FULL enabling hacks restored, as it's otherwise a dead .config item) is gripe free whether nohz_full CPUs are in use or not, and overhead went away in it and desktop configured master kernel (PREEMPT_VOLUNTARY). > --- > From c74a04c4fdfe1fa67933bb1ac83d3de0532aaab2 Mon Sep 17 00:00:00 > 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Wed, 27 Jan 2016 17:24:22 +0100 > Subject: [PATCH] mm, vmstat: make quiet_vmstat lighter > > Mike has reported a considerable overhead of refresh_cpu_vm_stats > from > the idle entry during pipe test: > 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 > 4.75% [kernel] [k] __schedule > 4.70% [kernel] [k] mutex_unlock > 3.14% [kernel] [k] __switch_to > > This is caused by 0eb77e988032 ("vmstat: make vmstat_updater > deferrable > again and shut down on idle") which has placed quiet_vmstat into > cpu_idle_loop. The main reason here seems to be that the idle entry > has > to get over all zones and perform atomic operations for each vmstat > entry even though there might be no per cpu diffs. This is a > pointless > overhead for _each_ idle entry. > > Make sure that quiet_vmstat is as light as possible. > > First of all it doesn't make any sense to do any local sync if the > current cpu is already set in oncpu_stat_off because vmstat_update > puts > itself there only if there is nothing to do. > > Then we can check need_update which should be a cheap way to check > for > potential per-cpu diffs and only then do refresh_cpu_vm_stats. > > The original patch also did cancel_delayed_work which we are not > doing > here. There are two reasons for that. Firstly cancel_delayed_work > from > idle context will blow up on RT kernels (reported by Mike): > [ 2.279582] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7 > [ 2.280444] Hardware name: MEDION MS-7848/MS-7848, BIOS > M7848W08.20C 09/23/2013 > [ 2.281316] ffff88040b00d640 ffff88040b01fe10 ffffffff812d20e2 > 0000000000000000 > [ 2.282202] ffff88040b01fe30 ffffffff81081095 ffff88041ec4cee0 > ffff88041ec501e0 > [ 2.283073] ffff88040b01fe48 ffffffff815ff910 ffff88041ec4cee0 > ffff88040b01fe88 > [ 2.283941] Call Trace: > [ 2.284797] [<ffffffff812d20e2>] dump_stack+0x49/0x67 > [ 2.285658] [<ffffffff81081095>] ___might_sleep+0xf5/0x180 > [ 2.286521] [<ffffffff815ff910>] rt_spin_lock+0x20/0x50 > [ 2.287382] [<ffffffff81075919>] try_to_grab_pending+0x69/0x240 > [ 2.288239] [<ffffffff81075b16>] cancel_delayed_work+0x26/0xe0 > [ 2.289094] [<ffffffff8115ec05>] quiet_vmstat+0x75/0xa0 > [ 2.289949] [<ffffffff8109ab38>] cpu_idle_loop+0x38/0x3e0 > [ 2.290800] [<ffffffff8109aef3>] cpu_startup_entry+0x13/0x20 > [ 2.291647] [<ffffffff81036164>] start_secondary+0x114/0x140 > > And secondly, even on !RT kernels it might add some non trivial > overhead > which is not necessary. Even if the vmstat worker wakes up and > preempts > idle then it will be most likely a single shot noop because the stats > were already synced and so it would end up on the oncpu_stat_off > anyway. > We just need to teach both vmstat_shepherd and vmstat_update to stop > scheduling the worker if there is nothing to do. > > Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/vmstat.c | 60 ++++++++++++++++++++++++++++++++++++++++----------- > --------- > 1 file changed, 40 insertions(+), 20 deletions(-) > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 40b2c74ddf16..7747f85537b6 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -1396,10 +1396,15 @@ static void vmstat_update(struct work_struct > *w) > * Counters were updated so we expect more updates > * to occur in the future. Keep on running the > * update worker thread. > + * If we were marked on cpu_stat_off clear the flag > + * so that vmstat_shepherd doesn't schedule us > again. > */ > - queue_delayed_work_on(smp_processor_id(), vmstat_wq, > - this_cpu_ptr(&vmstat_work), > - round_jiffies_relative(sysctl_stat_interval) > ); > + if (!cpumask_test_and_clear_cpu(smp_processor_id(), > + cpu_stat_off)) { > + queue_delayed_work_on(smp_processor_id(), > vmstat_wq, > + this_cpu_ptr(&vmstat_work), > + round_jiffies_relative(sysctl_stat_i > nterval)); > + } > } else { > /* > * We did not update any counters so the app may be > in > @@ -1417,18 +1422,6 @@ static void vmstat_update(struct work_struct > *w) > * until the diffs stay at zero. The function is used by NOHZ and > can only be > * invoked when tick processing is not active. > */ > -void quiet_vmstat(void) > -{ > - if (system_state != SYSTEM_RUNNING) > - return; > - > - do { > - if (!cpumask_test_and_set_cpu(smp_processor_id(), > cpu_stat_off)) > - cancel_delayed_work(this_cpu_ptr(&vmstat_wor > k)); > - > - } while (refresh_cpu_vm_stats(false)); > -} > - > /* > * Check if the diffs for a certain cpu indicate that > * an update is needed. > @@ -1452,6 +1445,30 @@ static bool need_update(int cpu) > return false; > } > > +void quiet_vmstat(void) > +{ > + if (system_state != SYSTEM_RUNNING) > + return; > + > + /* > + * If we are already in hands of the shepherd then there > + * is nothing for us to do here. > + */ > + if (cpumask_test_and_set_cpu(smp_processor_id(), > cpu_stat_off)) > + return; > + > + if (!need_update(smp_processor_id())) > + return; > + > + /* > + * Just refresh counters and do not care about the pending > delayed > + * vmstat_update. It doesn't fire that often to matter and > canceling > + * it would be too expensive from this path. > + * vmstat_shepherd will take care about that for us. > + */ > + refresh_cpu_vm_stats(false); > +} > + > > /* > * Shepherd worker thread that checks the > @@ -1470,11 +1487,14 @@ static void vmstat_shepherd(struct > work_struct *w) > get_online_cpus(); > /* Check processors whose vmstat worker threads have been > disabled */ > for_each_cpu(cpu, cpu_stat_off) > - if (need_update(cpu) && > - cpumask_test_and_clear_cpu(cpu, > cpu_stat_off)) > - > - queue_delayed_work_on(cpu, vmstat_wq, > - &per_cpu(vmstat_work, cpu), 0); > + if (need_update(cpu)) { > + if (cpumask_test_and_clear_cpu(cpu, > cpu_stat_off)) > + queue_delayed_work_on(cpu, > vmstat_wq, > + &per_cpu(vmstat_work, cpu), > 0); > + } else { > + cpumask_set_cpu(cpu, cpu_stat_off); > + cancel_delayed_work(this_cpu_ptr(&vmstat_wor > k)); > + } > > put_online_cpus(); > > -- > 2.7.0.rc3 > ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle) 2016-01-27 16:48 ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) " Michal Hocko 2016-01-27 17:03 ` Mike Galbraith @ 2016-01-27 18:26 ` Christoph Lameter 2016-01-28 15:21 ` Michal Hocko 2016-01-28 15:31 ` Michal Hocko 1 sibling, 2 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-27 18:26 UTC (permalink / raw) To: Michal Hocko; +Cc: Mike Galbraith, Peter Zijlstra, LKML On Wed, 27 Jan 2016, Michal Hocko wrote: > This is caused by 0eb77e988032 ("vmstat: make vmstat_updater deferrable > again and shut down on idle") which has placed quiet_vmstat into > cpu_idle_loop. The main reason here seems to be that the idle entry has > to get over all zones and perform atomic operations for each vmstat > entry even though there might be no per cpu diffs. This is a pointless > overhead for _each_ idle entry. > > Make sure that quiet_vmstat is as light as possible. > > First of all it doesn't make any sense to do any local sync if the > current cpu is already set in oncpu_stat_off because vmstat_update puts > itself there only if there is nothing to do. But there might be something to do that happened in the meantime because counters were incremented. It needs to be checked. The shepherd can do that but it will delay the folding of diffs for awhile. > +void quiet_vmstat(void) > +{ > + if (system_state != SYSTEM_RUNNING) > + return; > + > + /* > + * If we are already in hands of the shepherd then there > + * is nothing for us to do here. > + */ > + if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) > + return; > + > + if (!need_update(smp_processor_id())) > + return; > + > + /* > + * Just refresh counters and do not care about the pending delayed > + * vmstat_update. It doesn't fire that often to matter and canceling > + * it would be too expensive from this path. > + * vmstat_shepherd will take care about that for us. > + */ > + refresh_cpu_vm_stats(false); > +} The problem here is that there will be an additional tick generated on idle. This is an issue for power because now the processor has to needlessly wake up again, do tick processing etc just to effectively do a cancel_delayed_work(). The cancelling of the work request is required to avoid this additonal tick. > @@ -1470,11 +1487,14 @@ static void vmstat_shepherd(struct work_struct *w) > get_online_cpus(); > /* Check processors whose vmstat worker threads have been disabled */ > for_each_cpu(cpu, cpu_stat_off) > - if (need_update(cpu) && > - cpumask_test_and_clear_cpu(cpu, cpu_stat_off)) > - > - queue_delayed_work_on(cpu, vmstat_wq, > - &per_cpu(vmstat_work, cpu), 0); > + if (need_update(cpu)) { > + if (cpumask_test_and_clear_cpu(cpu, cpu_stat_off)) > + queue_delayed_work_on(cpu, vmstat_wq, > + &per_cpu(vmstat_work, cpu), 0); > + } else { > + cpumask_set_cpu(cpu, cpu_stat_off); Umm the flag is already set right? This just causes a bouncing cacheline since we are accessing a global set of cpus,. Drop this line? > + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); Ok so this is to move the cancel into the shepherd. Aside from the two issues mentioned this looks good. ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle) 2016-01-27 18:26 ` Christoph Lameter @ 2016-01-28 15:21 ` Michal Hocko 2016-01-28 16:40 ` Christoph Lameter 2016-01-28 15:31 ` Michal Hocko 1 sibling, 1 reply; 114+ messages in thread From: Michal Hocko @ 2016-01-28 15:21 UTC (permalink / raw) To: Christoph Lameter; +Cc: Mike Galbraith, Peter Zijlstra, LKML On Wed 27-01-16 12:26:16, Christoph Lameter wrote: > On Wed, 27 Jan 2016, Michal Hocko wrote: > > > This is caused by 0eb77e988032 ("vmstat: make vmstat_updater deferrable > > again and shut down on idle") which has placed quiet_vmstat into > > cpu_idle_loop. The main reason here seems to be that the idle entry has > > to get over all zones and perform atomic operations for each vmstat > > entry even though there might be no per cpu diffs. This is a pointless > > overhead for _each_ idle entry. > > > > Make sure that quiet_vmstat is as light as possible. > > > > First of all it doesn't make any sense to do any local sync if the > > current cpu is already set in oncpu_stat_off because vmstat_update puts > > itself there only if there is nothing to do. > > But there might be something to do that happened in the meantime because > counters were incremented. It needs to be checked. The shepherd can do > that but it will delay the folding of diffs for awhile. shepherd ticks with the same period so it should run relatively soon. What is important here the test_and_set is the fast path which will trigger a lot of the current CPU is mostly running in the userspace. > > +void quiet_vmstat(void) > > +{ > > + if (system_state != SYSTEM_RUNNING) > > + return; > > + > > + /* > > + * If we are already in hands of the shepherd then there > > + * is nothing for us to do here. > > + */ > > + if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) > > + return; > > + > > + if (!need_update(smp_processor_id())) > > + return; > > + > > + /* > > + * Just refresh counters and do not care about the pending delayed > > + * vmstat_update. It doesn't fire that often to matter and canceling > > + * it would be too expensive from this path. > > + * vmstat_shepherd will take care about that for us. > > + */ > > + refresh_cpu_vm_stats(false); > > +} > > The problem here is that there will be an additional tick generated on > idle. This is an issue for power because now the processor has to > needlessly wake up again, do tick processing etc just to effectively do a > cancel_delayed_work(). > > The cancelling of the work request is required to avoid this additonal > tick. Yes but as mentioned in the changelog calling cancel_delayed_work is quite expensive to be run on each idle entry slow path. In addition it is RT unfriendly. So considering a single tick vs. other reasons I think this is a reasonable compromise. > > @@ -1470,11 +1487,14 @@ static void vmstat_shepherd(struct work_struct *w) > > get_online_cpus(); > > /* Check processors whose vmstat worker threads have been disabled */ > > for_each_cpu(cpu, cpu_stat_off) > > - if (need_update(cpu) && > > - cpumask_test_and_clear_cpu(cpu, cpu_stat_off)) > > - > > - queue_delayed_work_on(cpu, vmstat_wq, > > - &per_cpu(vmstat_work, cpu), 0); > > + if (need_update(cpu)) { > > + if (cpumask_test_and_clear_cpu(cpu, cpu_stat_off)) > > + queue_delayed_work_on(cpu, vmstat_wq, > > + &per_cpu(vmstat_work, cpu), 0); > > + } else { > > + cpumask_set_cpu(cpu, cpu_stat_off); > > Umm the flag is already set right? This just causes a bouncing cacheline > since we are accessing a global set of cpus,. Drop this line? Ohh, right you are. I will remove this. > > > + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); > > Ok so this is to move the cancel into the shepherd. Yeah, I have added a comment to explain that. > Aside from the two issues mentioned this looks good. Updated version: --- >From 1d0febff044382b88ff44d245b5afe1a6e402c62 Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@suse.com> Date: Wed, 27 Jan 2016 17:24:22 +0100 Subject: [PATCH] mm, vmstat: make quiet_vmstat lighter Mike has reported a considerable overhead of refresh_cpu_vm_stats from the idle entry during pipe test: 12.89% [kernel] [k] refresh_cpu_vm_stats.isra.12 4.75% [kernel] [k] __schedule 4.70% [kernel] [k] mutex_unlock 3.14% [kernel] [k] __switch_to This is caused by 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle") which has placed quiet_vmstat into cpu_idle_loop. The main reason here seems to be that the idle entry has to get over all zones and perform atomic operations for each vmstat entry even though there might be no per cpu diffs. This is a pointless overhead for _each_ idle entry. Make sure that quiet_vmstat is as light as possible. First of all it doesn't make any sense to do any local sync if the current cpu is already set in oncpu_stat_off because vmstat_update puts itself there only if there is nothing to do. Then we can check need_update which should be a cheap way to check for potential per-cpu diffs and only then do refresh_cpu_vm_stats. The original patch also did cancel_delayed_work which we are not doing here. There are two reasons for that. Firstly cancel_delayed_work from idle context will blow up on RT kernels (reported by Mike): [ 2.279582] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7 [ 2.280444] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 [ 2.281316] ffff88040b00d640 ffff88040b01fe10 ffffffff812d20e2 0000000000000000 [ 2.282202] ffff88040b01fe30 ffffffff81081095 ffff88041ec4cee0 ffff88041ec501e0 [ 2.283073] ffff88040b01fe48 ffffffff815ff910 ffff88041ec4cee0 ffff88040b01fe88 [ 2.283941] Call Trace: [ 2.284797] [<ffffffff812d20e2>] dump_stack+0x49/0x67 [ 2.285658] [<ffffffff81081095>] ___might_sleep+0xf5/0x180 [ 2.286521] [<ffffffff815ff910>] rt_spin_lock+0x20/0x50 [ 2.287382] [<ffffffff81075919>] try_to_grab_pending+0x69/0x240 [ 2.288239] [<ffffffff81075b16>] cancel_delayed_work+0x26/0xe0 [ 2.289094] [<ffffffff8115ec05>] quiet_vmstat+0x75/0xa0 [ 2.289949] [<ffffffff8109ab38>] cpu_idle_loop+0x38/0x3e0 [ 2.290800] [<ffffffff8109aef3>] cpu_startup_entry+0x13/0x20 [ 2.291647] [<ffffffff81036164>] start_secondary+0x114/0x140 And secondly, even on !RT kernels it might add some non trivial overhead which is not necessary. Even if the vmstat worker wakes up and preempts idle then it will be most likely a single shot noop because the stats were already synced and so it would end up on the oncpu_stat_off anyway. We just need to teach both vmstat_shepherd and vmstat_update to stop scheduling the worker if there is nothing to do. Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Michal Hocko <mhocko@suse.com> --- mm/vmstat.c | 64 ++++++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 44 insertions(+), 20 deletions(-) diff --git a/mm/vmstat.c b/mm/vmstat.c index 40b2c74ddf16..eb30bf45bd55 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1396,10 +1396,15 @@ static void vmstat_update(struct work_struct *w) * Counters were updated so we expect more updates * to occur in the future. Keep on running the * update worker thread. + * If we were marked on cpu_stat_off clear the flag + * so that vmstat_shepherd doesn't schedule us again. */ - queue_delayed_work_on(smp_processor_id(), vmstat_wq, - this_cpu_ptr(&vmstat_work), - round_jiffies_relative(sysctl_stat_interval)); + if (!cpumask_test_and_clear_cpu(smp_processor_id(), + cpu_stat_off)) { + queue_delayed_work_on(smp_processor_id(), vmstat_wq, + this_cpu_ptr(&vmstat_work), + round_jiffies_relative(sysctl_stat_interval)); + } } else { /* * We did not update any counters so the app may be in @@ -1417,18 +1422,6 @@ static void vmstat_update(struct work_struct *w) * until the diffs stay at zero. The function is used by NOHZ and can only be * invoked when tick processing is not active. */ -void quiet_vmstat(void) -{ - if (system_state != SYSTEM_RUNNING) - return; - - do { - if (!cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) - cancel_delayed_work(this_cpu_ptr(&vmstat_work)); - - } while (refresh_cpu_vm_stats(false)); -} - /* * Check if the diffs for a certain cpu indicate that * an update is needed. @@ -1452,6 +1445,30 @@ static bool need_update(int cpu) return false; } +void quiet_vmstat(void) +{ + if (system_state != SYSTEM_RUNNING) + return; + + /* + * If we are already in hands of the shepherd then there + * is nothing for us to do here. + */ + if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) + return; + + if (!need_update(smp_processor_id())) + return; + + /* + * Just refresh counters and do not care about the pending delayed + * vmstat_update. It doesn't fire that often to matter and canceling + * it would be too expensive from this path. + * vmstat_shepherd will take care about that for us. + */ + refresh_cpu_vm_stats(false); +} + /* * Shepherd worker thread that checks the @@ -1470,11 +1487,18 @@ static void vmstat_shepherd(struct work_struct *w) get_online_cpus(); /* Check processors whose vmstat worker threads have been disabled */ for_each_cpu(cpu, cpu_stat_off) - if (need_update(cpu) && - cpumask_test_and_clear_cpu(cpu, cpu_stat_off)) - - queue_delayed_work_on(cpu, vmstat_wq, - &per_cpu(vmstat_work, cpu), 0); + if (need_update(cpu)) { + if (cpumask_test_and_clear_cpu(cpu, cpu_stat_off)) + queue_delayed_work_on(cpu, vmstat_wq, + &per_cpu(vmstat_work, cpu), 0); + } else { + /* + * Cancel the work if quiet_vmstat has put this + * cpu on cpu_stat_off because the work item might + * be still scheduled + */ + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); + } put_online_cpus(); -- 2.7.0.rc3 -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle) 2016-01-28 15:21 ` Michal Hocko @ 2016-01-28 16:40 ` Christoph Lameter 2016-01-28 16:53 ` Michal Hocko 0 siblings, 1 reply; 114+ messages in thread From: Christoph Lameter @ 2016-01-28 16:40 UTC (permalink / raw) To: Michal Hocko; +Cc: Mike Galbraith, Peter Zijlstra, LKML On Thu, 28 Jan 2016, Michal Hocko wrote: > +void quiet_vmstat(void) > +{ > + if (system_state != SYSTEM_RUNNING) > + return; > + > + /* > + * If we are already in hands of the shepherd then there > + * is nothing for us to do here. > + */ > + if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) > + return; > + > + if (!need_update(smp_processor_id())) > + return; > + You can drop the need_update check. refresh_cpu_vm_stats() does the same checks in a more efficient way. If you keep this the checks will performed twice. > + /* > + * Just refresh counters and do not care about the pending delayed > + * vmstat_update. It doesn't fire that often to matter and canceling > + * it would be too expensive from this path. > + * vmstat_shepherd will take care about that for us. > + */ > + refresh_cpu_vm_stats(false); > +} > + > ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle) 2016-01-28 16:40 ` Christoph Lameter @ 2016-01-28 16:53 ` Michal Hocko 2016-01-28 17:05 ` Christoph Lameter 0 siblings, 1 reply; 114+ messages in thread From: Michal Hocko @ 2016-01-28 16:53 UTC (permalink / raw) To: Christoph Lameter; +Cc: Mike Galbraith, Peter Zijlstra, LKML On Thu 28-01-16 10:40:10, Christoph Lameter wrote: > On Thu, 28 Jan 2016, Michal Hocko wrote: > > > +void quiet_vmstat(void) > > +{ > > + if (system_state != SYSTEM_RUNNING) > > + return; > > + > > + /* > > + * If we are already in hands of the shepherd then there > > + * is nothing for us to do here. > > + */ > > + if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) > > + return; > > + > > + if (!need_update(smp_processor_id())) > > + return; > > + > > You can drop the need_update check. refresh_cpu_vm_stats() does the same > checks in a more efficient way. If you keep this the checks will > performed twice. refresh_cpu_vm_stats does this_cpu_xchg for each counter. Is it really more effective than memchr_inv? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle) 2016-01-28 16:53 ` Michal Hocko @ 2016-01-28 17:05 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-28 17:05 UTC (permalink / raw) To: Michal Hocko; +Cc: Mike Galbraith, Peter Zijlstra, LKML On Thu, 28 Jan 2016, Michal Hocko wrote: > > You can drop the need_update check. refresh_cpu_vm_stats() does the same > > checks in a more efficient way. If you keep this the checks will > > performed twice. > > refresh_cpu_vm_stats does this_cpu_xchg for each counter. Is it really > more effective than memchr_inv? Ok makes sense. Acked-by: Christoph Lameter <cl@linux.com> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle) 2016-01-27 18:26 ` Christoph Lameter 2016-01-28 15:21 ` Michal Hocko @ 2016-01-28 15:31 ` Michal Hocko 2016-01-28 15:37 ` [PATCH] vmstat: make vmstat_update deferrable (was: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter) " Michal Hocko 2016-01-28 16:42 ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast " Christoph Lameter 1 sibling, 2 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-28 15:31 UTC (permalink / raw) To: Christoph Lameter; +Cc: Mike Galbraith, Peter Zijlstra, LKML On Wed 27-01-16 12:26:16, Christoph Lameter wrote: > On Wed, 27 Jan 2016, Michal Hocko wrote: [...] > > +void quiet_vmstat(void) > > +{ > > + if (system_state != SYSTEM_RUNNING) > > + return; > > + > > + /* > > + * If we are already in hands of the shepherd then there > > + * is nothing for us to do here. > > + */ > > + if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) > > + return; > > + > > + if (!need_update(smp_processor_id())) > > + return; > > + > > + /* > > + * Just refresh counters and do not care about the pending delayed > > + * vmstat_update. It doesn't fire that often to matter and canceling > > + * it would be too expensive from this path. > > + * vmstat_shepherd will take care about that for us. > > + */ > > + refresh_cpu_vm_stats(false); > > +} > > The problem here is that there will be an additional tick generated on > idle. This is an issue for power because now the processor has to > needlessly wake up again, do tick processing etc just to effectively do a > cancel_delayed_work(). Thinking about it some more, making vmstat_update deferrable should help to not interrupt idle no? --- diff --git a/mm/vmstat.c b/mm/vmstat.c index eb30bf45bd55..69537d2be6f6 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1512,7 +1512,7 @@ static void __init start_shepherd_timer(void) int cpu; for_each_possible_cpu(cpu) - INIT_DELAYED_WORK(per_cpu_ptr(&vmstat_work, cpu), + INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), vmstat_update); if (!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL)) -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH] vmstat: make vmstat_update deferrable (was: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter) path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle) 2016-01-28 15:31 ` Michal Hocko @ 2016-01-28 15:37 ` Michal Hocko 2016-01-28 16:48 ` Christoph Lameter 2016-01-28 16:42 ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast " Christoph Lameter 1 sibling, 1 reply; 114+ messages in thread From: Michal Hocko @ 2016-01-28 15:37 UTC (permalink / raw) To: Christoph Lameter; +Cc: Mike Galbraith, Peter Zijlstra, LKML On Thu 28-01-16 16:31:11, Michal Hocko wrote: > On Wed 27-01-16 12:26:16, Christoph Lameter wrote: > > On Wed, 27 Jan 2016, Michal Hocko wrote: > [...] > > > +void quiet_vmstat(void) > > > +{ > > > + if (system_state != SYSTEM_RUNNING) > > > + return; > > > + > > > + /* > > > + * If we are already in hands of the shepherd then there > > > + * is nothing for us to do here. > > > + */ > > > + if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off)) > > > + return; > > > + > > > + if (!need_update(smp_processor_id())) > > > + return; > > > + > > > + /* > > > + * Just refresh counters and do not care about the pending delayed > > > + * vmstat_update. It doesn't fire that often to matter and canceling > > > + * it would be too expensive from this path. > > > + * vmstat_shepherd will take care about that for us. > > > + */ > > > + refresh_cpu_vm_stats(false); > > > +} > > > > The problem here is that there will be an additional tick generated on > > idle. This is an issue for power because now the processor has to > > needlessly wake up again, do tick processing etc just to effectively do a > > cancel_delayed_work(). > > Thinking about it some more, making vmstat_update deferrable should help > to not interrupt idle no? With the full changelog >From 89858bf594c020f065b4f415a12d75bee64a7954 Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@suse.com> Date: Thu, 28 Jan 2016 16:31:43 +0100 Subject: [PATCH] vmstat: make vmstat_update deferrable 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle") made vmstat_shepherd deferrable. vmstat_update itself is still using standard timer which might interrupt idle task. This is possible because "mm, vmstat: make quiet_vmstat lighter" removed cancel_delayed_work from the quiet_vmstat. Change vmstat_work to use DEFERRABLE_WORK to prevent from pointless wakeups from the idle context. Signed-off-by: Michal Hocko <mhocko@suse.com> --- mm/vmstat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/vmstat.c b/mm/vmstat.c index eb30bf45bd55..69537d2be6f6 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1512,7 +1512,7 @@ static void __init start_shepherd_timer(void) int cpu; for_each_possible_cpu(cpu) - INIT_DELAYED_WORK(per_cpu_ptr(&vmstat_work, cpu), + INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), vmstat_update); if (!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL)) -- 2.7.0.rc3 -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH] vmstat: make vmstat_update deferrable (was: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter) path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle) 2016-01-28 15:37 ` [PATCH] vmstat: make vmstat_update deferrable (was: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter) " Michal Hocko @ 2016-01-28 16:48 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-28 16:48 UTC (permalink / raw) To: akpm, Michal Hocko; +Cc: Mike Galbraith, Peter Zijlstra, LKML On Thu, 28 Jan 2016, Michal Hocko wrote: > down on idle") made vmstat_shepherd deferrable. vmstat_update itself > is still using standard timer which might interrupt idle task. This > is possible because "mm, vmstat: make quiet_vmstat lighter" removed > cancel_delayed_work from the quiet_vmstat. Change vmstat_work to > use DEFERRABLE_WORK to prevent from pointless wakeups from the idle > context. Acked-by: Christoph Lameter <cl@linux.com> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle) 2016-01-28 15:31 ` Michal Hocko 2016-01-28 15:37 ` [PATCH] vmstat: make vmstat_update deferrable (was: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter) " Michal Hocko @ 2016-01-28 16:42 ` Christoph Lameter 1 sibling, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-28 16:42 UTC (permalink / raw) To: Michal Hocko; +Cc: Mike Galbraith, Peter Zijlstra, LKML On Thu, 28 Jan 2016, Michal Hocko wrote: > Thinking about it some more, making vmstat_update deferrable should help > to not interrupt idle no? It will align with the tick then. Yes indeed this should be done. > --- > diff --git a/mm/vmstat.c b/mm/vmstat.c > index eb30bf45bd55..69537d2be6f6 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -1512,7 +1512,7 @@ static void __init start_shepherd_timer(void) > int cpu; > > for_each_possible_cpu(cpu) > - INIT_DELAYED_WORK(per_cpu_ptr(&vmstat_work, cpu), > + INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > vmstat_update); > > if (!alloc_cpumask_var(&cpu_stat_off, GFP_KERNEL)) > ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-22 16:46 ` Christoph Lameter @ 2016-01-24 16:57 ` Linus Torvalds -1 siblings, 0 replies; 114+ messages in thread From: Linus Torvalds @ 2016-01-24 16:57 UTC (permalink / raw) To: Christoph Lameter Cc: Michal Hocko, Sasha Levin, LKML, linux-mm, Andrew Morton On Fri, Jan 22, 2016 at 8:46 AM, Christoph Lameter <cl@linux.com> wrote: > > Subject: vmstat: Remove BUG_ON from vmstat_update > > If we detect that there is nothing to do just set the flag and do not check > if it was already set before. [..] Ok, I am assuming this is in Andrew's queue already, but this bug hit my machine overnight, so I'm applying it directly.. Linus ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-24 16:57 ` Linus Torvalds 0 siblings, 0 replies; 114+ messages in thread From: Linus Torvalds @ 2016-01-24 16:57 UTC (permalink / raw) To: Christoph Lameter Cc: Michal Hocko, Sasha Levin, LKML, linux-mm, Andrew Morton On Fri, Jan 22, 2016 at 8:46 AM, Christoph Lameter <cl@linux.com> wrote: > > Subject: vmstat: Remove BUG_ON from vmstat_update > > If we detect that there is nothing to do just set the flag and do not check > if it was already set before. [..] Ok, I am assuming this is in Andrew's queue already, but this bug hit my machine overnight, so I'm applying it directly.. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-20 14:37 ` Michal Hocko @ 2016-01-20 15:14 ` Christoph Lameter -1 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-20 15:14 UTC (permalink / raw) To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Wed, 20 Jan 2016, Michal Hocko wrote: > [CCing Andrew] > > I am just reading through this old discussion again because "vmstat: > make vmstat_updater deferrable again and shut down on idle" which seems > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see > any follow up fix merged to linus tree Is there any way to reproce this issue? This is running through trinity right? Can we please get the exact syscall that causes this to occur? ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-20 15:14 ` Christoph Lameter 0 siblings, 0 replies; 114+ messages in thread From: Christoph Lameter @ 2016-01-20 15:14 UTC (permalink / raw) To: Michal Hocko; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Wed, 20 Jan 2016, Michal Hocko wrote: > [CCing Andrew] > > I am just reading through this old discussion again because "vmstat: > make vmstat_updater deferrable again and shut down on idle" which seems > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see > any follow up fix merged to linus tree Is there any way to reproce this issue? This is running through trinity right? Can we please get the exact syscall that causes this to occur? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! 2016-01-20 15:14 ` Christoph Lameter @ 2016-01-20 15:20 ` Michal Hocko -1 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-20 15:20 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Wed 20-01-16 09:14:06, Christoph Lameter wrote: > On Wed, 20 Jan 2016, Michal Hocko wrote: > > > [CCing Andrew] > > > > I am just reading through this old discussion again because "vmstat: > > make vmstat_updater deferrable again and shut down on idle" which seems > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see > > any follow up fix merged to linus tree > > Is there any way to reproce this issue? This is running through trinity > right? Can we please get the exact syscall that causes this to occur? As per the backtrace in the initial report this seems to be time dependent as the crash happens from _kthread_ context. So it doesn't seem to be directly related to any particular syscall. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: mm, vmstat: kernel BUG at mm/vmstat.c:1408! @ 2016-01-20 15:20 ` Michal Hocko 0 siblings, 0 replies; 114+ messages in thread From: Michal Hocko @ 2016-01-20 15:20 UTC (permalink / raw) To: Christoph Lameter; +Cc: Sasha Levin, LKML, linux-mm, Andrew Morton On Wed 20-01-16 09:14:06, Christoph Lameter wrote: > On Wed, 20 Jan 2016, Michal Hocko wrote: > > > [CCing Andrew] > > > > I am just reading through this old discussion again because "vmstat: > > make vmstat_updater deferrable again and shut down on idle" which seems > > to be the culprit AFAIU has been merged as 0eb77e988032 and I do not see > > any follow up fix merged to linus tree > > Is there any way to reproce this issue? This is running through trinity > right? Can we please get the exact syscall that causes this to occur? As per the backtrace in the initial report this seems to be time dependent as the crash happens from _kthread_ context. So it doesn't seem to be directly related to any particular syscall. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 114+ messages in thread
end of thread, other threads:[~2016-01-28 17:05 UTC | newest] Thread overview: 114+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-12-19 0:33 mm, vmstat: kernel BUG at mm/vmstat.c:1408! Sasha Levin 2015-12-19 0:33 ` Sasha Levin 2015-12-21 13:08 ` Christoph Lameter 2015-12-21 13:08 ` Christoph Lameter 2015-12-21 20:28 ` Sasha Levin 2015-12-21 20:28 ` Sasha Levin 2015-12-21 21:07 ` Sasha Levin 2015-12-21 21:07 ` Sasha Levin 2015-12-21 21:14 ` Christoph Lameter 2015-12-21 21:14 ` Christoph Lameter 2015-12-22 17:21 ` Christoph Lameter 2015-12-22 17:21 ` Christoph Lameter 2015-12-24 20:14 ` Sasha Levin 2015-12-29 17:01 ` Christoph Lameter 2015-12-29 17:01 ` Christoph Lameter 2015-12-29 17:18 ` Christoph Lameter 2015-12-29 17:18 ` Christoph Lameter 2016-01-04 18:05 ` Christoph Lameter 2016-01-04 18:05 ` Christoph Lameter 2016-01-04 18:46 ` Sasha Levin 2016-01-04 18:46 ` Sasha Levin 2016-01-12 11:31 ` Shiraz Hashim 2016-01-12 11:31 ` Shiraz Hashim 2016-01-12 12:23 ` Christoph Lameter 2016-01-12 12:23 ` Christoph Lameter 2016-01-12 12:27 ` Shiraz Hashim 2016-01-12 12:27 ` Shiraz Hashim 2016-01-13 11:36 ` Shiraz Hashim 2016-01-13 11:36 ` Shiraz Hashim 2016-01-13 12:32 ` Shiraz Hashim 2016-01-13 12:32 ` Shiraz Hashim 2016-01-14 21:06 ` Sasha Levin 2016-01-14 21:06 ` Sasha Levin 2016-01-20 14:37 ` Michal Hocko 2016-01-20 14:37 ` Michal Hocko 2016-01-20 14:56 ` Sasha Levin 2016-01-20 14:56 ` Sasha Levin 2016-01-20 15:10 ` Michal Hocko 2016-01-20 15:10 ` Michal Hocko 2016-01-20 15:20 ` Christoph Lameter 2016-01-20 15:20 ` Christoph Lameter 2016-01-20 15:49 ` Sasha Levin 2016-01-20 15:49 ` Sasha Levin 2016-01-20 15:55 ` Christoph Lameter 2016-01-20 15:55 ` Christoph Lameter 2016-01-20 21:28 ` Michal Hocko 2016-01-20 21:28 ` Michal Hocko 2016-01-20 21:57 ` Christoph Lameter 2016-01-20 21:57 ` Christoph Lameter 2016-01-21 8:24 ` Michal Hocko 2016-01-21 8:24 ` Michal Hocko 2016-01-21 15:45 ` Christoph Lameter 2016-01-21 15:45 ` Christoph Lameter 2016-01-21 16:51 ` Michal Hocko 2016-01-21 16:51 ` Michal Hocko 2016-01-21 17:38 ` Christoph Lameter 2016-01-21 17:38 ` Christoph Lameter 2016-01-22 11:00 ` Shiraz Hashim 2016-01-22 11:00 ` Shiraz Hashim 2016-01-22 14:04 ` Michal Hocko 2016-01-22 14:04 ` Michal Hocko 2016-01-22 16:07 ` Christoph Lameter 2016-01-22 16:07 ` Christoph Lameter 2016-01-22 16:12 ` Michal Hocko 2016-01-22 16:12 ` Michal Hocko 2016-01-22 16:46 ` Christoph Lameter 2016-01-22 16:46 ` Christoph Lameter 2016-01-22 17:12 ` Michal Hocko 2016-01-22 17:12 ` Michal Hocko 2016-01-23 16:21 ` fast path cycle muncher (vmstat: make vmstat_updater deferrable again and shut down on idle) Mike Galbraith 2016-01-24 0:33 ` Christoph Lameter 2016-01-24 2:46 ` Mike Galbraith 2016-01-24 3:46 ` Christoph Lameter 2016-01-24 5:36 ` Mike Galbraith 2016-01-25 17:42 ` Michal Hocko 2016-01-25 18:02 ` Christoph Lameter 2016-01-25 20:13 ` Michal Hocko 2016-01-26 16:25 ` Christoph Lameter 2016-01-26 18:31 ` Mike Galbraith 2016-01-26 18:34 ` Christoph Lameter 2016-01-26 18:45 ` Mike Galbraith 2016-01-26 19:20 ` Christoph Lameter 2016-01-27 3:12 ` Mike Galbraith 2016-01-27 4:15 ` Mike Galbraith 2016-01-27 16:28 ` Christoph Lameter 2016-01-28 15:36 ` Frederic Weisbecker 2016-01-28 16:42 ` Christoph Lameter 2016-01-26 2:14 ` Mike Galbraith 2016-01-26 2:25 ` Mike Galbraith 2016-01-26 16:26 ` Christoph Lameter 2016-01-26 17:39 ` Mike Galbraith 2016-01-26 18:19 ` Christoph Lameter 2016-01-26 16:26 ` Christoph Lameter 2016-01-26 17:08 ` Mike Galbraith 2016-01-26 18:22 ` Christoph Lameter 2016-01-26 19:09 ` Mike Galbraith 2016-01-26 19:22 ` Christoph Lameter 2016-01-27 16:48 ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) " Michal Hocko 2016-01-27 17:03 ` Mike Galbraith 2016-01-27 18:26 ` Christoph Lameter 2016-01-28 15:21 ` Michal Hocko 2016-01-28 16:40 ` Christoph Lameter 2016-01-28 16:53 ` Michal Hocko 2016-01-28 17:05 ` Christoph Lameter 2016-01-28 15:31 ` Michal Hocko 2016-01-28 15:37 ` [PATCH] vmstat: make vmstat_update deferrable (was: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter) " Michal Hocko 2016-01-28 16:48 ` Christoph Lameter 2016-01-28 16:42 ` [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast " Christoph Lameter 2016-01-24 16:57 ` mm, vmstat: kernel BUG at mm/vmstat.c:1408! Linus Torvalds 2016-01-24 16:57 ` Linus Torvalds 2016-01-20 15:14 ` Christoph Lameter 2016-01-20 15:14 ` Christoph Lameter 2016-01-20 15:20 ` Michal Hocko 2016-01-20 15:20 ` Michal Hocko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.