* [next-20160506 mm,smp] hang up at csd_lock_wait() from drain_all_pages()
@ 2016-05-06 10:58 ` Tetsuo Handa
0 siblings, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2016-05-06 10:58 UTC (permalink / raw)
To: peterz, tglx, mingo, akpm, mgorman, mhocko; +Cc: linux-kernel, linux-mm
Hello.
I'm hitting this bug while doing OOM-killer torture test, but I can't tell
whether this is a mm bug. When I hit this bug, SysRq-l shows that multiple
CPUs are spinning at csd_lock_wait() called from on_each_cpu_mask()
called from drain_all_pages(). We can see that drain_all_pages() is using
unprotected global variable
static cpumask_t cpus_with_pcps;
when calling on_each_cpu_mask().
Question 1:
Is on_each_cpu_mask() safe if struct cpumask argument (in this case
cpus_with_pcps) is modified by other CPUs?
Question 2:
Is it legal to call smp_call_func_t callback (in this case drain_local_pages())
concurrently which could contend on same spinlock (in this case zone->lock
in free_pcppages_bulk() called from drain_pages_zone() called from
drain_local_pages())?
Question 3:
Are on_each_cpu_mask(mask1, func1, info1, true) and on_each_cpu_mask(mask2,
func2, info2, true) safe each other when they are called at the same time?
(In other words, do we need to guarantee that on_each_cpu_mask() calls with
wait == true are globally serialized?)
Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160506.txt.xz .
----------------------------------------
[ 234.861274] sysrq: SysRq : Show backtrace of all active CPUs
[ 234.864808] Sending NMI to all CPUs:
[ 234.867281] NMI backtrace for cpu 2
[ 234.869576] CPU: 2 PID: 1181 Comm: smbd Not tainted 4.6.0-rc6-next-20160506 #124
[ 234.873806] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 234.879711] task: ffff88003c374000 ti: ffff88003afec000 task.ti: ffff88003afec000
[ 234.884219] RIP: 0010:[<ffffffff810f6c0f>] [<ffffffff810f6c0f>] smp_call_function_many+0x1ef/0x240
[ 234.890500] RSP: 0000:ffff88003afef9d8 EFLAGS: 00000202
[ 234.893838] RAX: 0000000000000000 RBX: ffffffff8114ad30 RCX: ffffe8fffee02360
[ 234.897968] RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff88003fb3a860
[ 234.902089] RBP: ffff88003afefa10 R08: 000000000000000b R09: 0000000000000000
[ 234.906288] R10: 0000000000000001 R11: ffff88003bac9f84 R12: 0000000000000000
[ 234.910464] R13: ffff88003d6987c0 R14: 0000000000000040 R15: 0000000000018780
[ 234.914637] FS: 00007f743eed2840(0000) GS:ffff88003d680000(0000) knlGS:0000000000000000
[ 234.919262] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 234.922700] CR2: 00007f743b68ca00 CR3: 000000003875e000 CR4: 00000000001406e0
[ 234.926923] Stack:
[ 234.928529] 0000000000000002 0100000000000001 ffffffff82ce9c00 ffffffff8114ad30
[ 234.932819] 0000000000000000 0000000000000002 0000000000000000 ffff88003afefa40
[ 234.937150] ffffffff810f6d64 0000000000000040 0000000000000000 0000000000000003
[ 234.941562] Call Trace:
[ 234.943421] [<ffffffff8114ad30>] ? page_alloc_cpu_notify+0x40/0x40
[ 234.947388] [<ffffffff810f6d64>] on_each_cpu_mask+0x24/0x90
[ 234.950884] [<ffffffff8114b6da>] drain_all_pages+0xca/0xe0
[ 234.954296] [<ffffffff8114d5ac>] __alloc_pages_nodemask+0x70c/0xe30
[ 234.958175] [<ffffffff811972d6>] alloc_pages_current+0x96/0x1b0
[ 234.961818] [<ffffffff81142c6d>] __page_cache_alloc+0x12d/0x160
[ 234.965498] [<ffffffff811533a9>] __do_page_cache_readahead+0x109/0x360
[ 234.969434] [<ffffffff8115340b>] ? __do_page_cache_readahead+0x16b/0x360
[ 234.973504] [<ffffffff811435e7>] ? pagecache_get_page+0x27/0x260
[ 234.977185] [<ffffffff8114671b>] filemap_fault+0x31b/0x670
[ 234.980625] [<ffffffffa0251ab0>] ? xfs_ilock+0xd0/0xe0 [xfs]
[ 234.984134] [<ffffffffa0245a79>] xfs_filemap_fault+0x39/0x60 [xfs]
[ 234.988488] [<ffffffff811700db>] __do_fault+0x6b/0x120
[ 234.991775] [<ffffffff811768c7>] handle_mm_fault+0xed7/0x16f0
[ 234.995342] [<ffffffff81175a38>] ? handle_mm_fault+0x48/0x16f0
[ 234.998910] [<ffffffff8105af27>] ? __do_page_fault+0x127/0x4d0
[ 235.002695] [<ffffffff8105afb5>] __do_page_fault+0x1b5/0x4d0
[ 235.006205] [<ffffffff8105b300>] do_page_fault+0x30/0x80
[ 235.009618] [<ffffffff8161aa28>] page_fault+0x28/0x30
[ 235.012798] Code: 63 d2 e8 65 bb 1e 00 3b 05 93 b7 c1 00 89 c2 0f 8d 99 fe ff ff 48 98 49 8b 4d 00 48 03 0c c5 e0 ed cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 eb be 48 c7 c2 80 1a d1 81 4c
89 fe 44 89 f7
[ 235.024075] NMI backtrace for cpu 1
[ 235.026474] CPU: 1 PID: 398 Comm: systemd-journal Not tainted 4.6.0-rc6-next-20160506 #124
[ 235.031317] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 235.039025] task: ffff8800386de080 ti: ffff880036550000 task.ti: ffff880036550000
[ 235.043399] RIP: 0010:[<ffffffff810f6c0f>] [<ffffffff810f6c0f>] smp_call_function_many+0x1ef/0x240
[ 235.048558] RSP: 0000:ffff8800365539d8 EFLAGS: 00000202
[ 235.051821] RAX: 0000000000000000 RBX: ffffffff8114ad30 RCX: ffffe8fffee02340
[ 235.056213] RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff88003fb3a8e0
[ 235.060439] RBP: ffff880036553a10 R08: 000000000000000d R09: 0000000000000000
[ 235.064642] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 235.068834] R13: ffff88003d6587c0 R14: 0000000000000040 R15: 0000000000018780
[ 235.073214] FS: 00007f0362f4f880(0000) GS:ffff88003d640000(0000) knlGS:0000000000000000
[ 235.077848] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 235.081284] CR2: 00007f0361b062e0 CR3: 00000000364d8000 CR4: 00000000001406e0
[ 235.086255] Stack:
[ 235.088069] 0000000000000001 0100000000000001 ffffffff82ce9c00 ffffffff8114ad30
[ 235.092562] 0000000000000000 0000000000000001 0000000000000000 ffff880036553a40
[ 235.096871] ffffffff810f6d64 0000000000000040 0000000000000000 0000000000000003
[ 235.101206] Call Trace:
[ 235.102992] [<ffffffff8114ad30>] ? page_alloc_cpu_notify+0x40/0x40
[ 235.106935] [<ffffffff810f6d64>] on_each_cpu_mask+0x24/0x90
[ 235.110327] [<ffffffff8114b6da>] drain_all_pages+0xca/0xe0
[ 235.113678] [<ffffffff8114d5ac>] __alloc_pages_nodemask+0x70c/0xe30
[ 235.117443] [<ffffffff811972d6>] alloc_pages_current+0x96/0x1b0
[ 235.121231] [<ffffffff81142c6d>] __page_cache_alloc+0x12d/0x160
[ 235.124842] [<ffffffff811533a9>] __do_page_cache_readahead+0x109/0x360
[ 235.128717] [<ffffffff8115340b>] ? __do_page_cache_readahead+0x16b/0x360
[ 235.132676] [<ffffffff811435e7>] ? pagecache_get_page+0x27/0x260
[ 235.138094] [<ffffffff8114671b>] filemap_fault+0x31b/0x670
[ 235.141695] [<ffffffffa0251ab0>] ? xfs_ilock+0xd0/0xe0 [xfs]
[ 235.145120] [<ffffffffa0245a79>] xfs_filemap_fault+0x39/0x60 [xfs]
[ 235.148808] [<ffffffff811700db>] __do_fault+0x6b/0x120
[ 235.151975] [<ffffffff811768c7>] handle_mm_fault+0xed7/0x16f0
[ 235.155636] [<ffffffff81175a38>] ? handle_mm_fault+0x48/0x16f0
[ 235.159145] [<ffffffff8105af27>] ? __do_page_fault+0x127/0x4d0
[ 235.162643] [<ffffffff8105afb5>] __do_page_fault+0x1b5/0x4d0
[ 235.166045] [<ffffffff8105b300>] do_page_fault+0x30/0x80
[ 235.169412] [<ffffffff8161aa28>] page_fault+0x28/0x30
[ 235.172997] Code: 63 d2 e8 65 bb 1e 00 3b 05 93 b7 c1 00 89 c2 0f 8d 99 fe ff ff 48 98 49 8b 4d 00 48 03 0c c5 e0 ed cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 eb be 48 c7 c2 80 1a d1 81 4c
89 fe 44 89 f7
[ 235.184715] NMI backtrace for cpu 0
[ 235.187229] CPU: 0 PID: 1294 Comm: tgid=1294 Not tainted 4.6.0-rc6-next-20160506 #124
[ 235.192548] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 235.198853] task: ffff88003ad82100 ti: ffff88003ad88000 task.ti: ffff88003ad88000
[ 235.203368] RIP: 0010:[<ffffffff810c2d94>] [<ffffffff810c2d94>] __rwsem_do_wake+0x134/0x150
[ 235.208393] RSP: 0000:ffff88003ad8bc20 EFLAGS: 00000097
[ 235.211722] RAX: ffff880028647c00 RBX: ffff88003ad5e420 RCX: ffffffff00000000
[ 235.216326] RDX: fffffffe00000001 RSI: 0000000000000001 RDI: ffffffffffffffff
[ 235.220608] RBP: ffff88003ad8bc48 R08: 0000000000000001 R09: 0000000000000001
[ 235.225009] R10: ffff88003ad82100 R11: ffff88003ad82c98 R12: ffff88003ad5e420
[ 235.229220] R13: 0000000000000286 R14: 00000000024201ca R15: ffff88003ad8bdb0
[ 235.233383] FS: 00007f30f147f740(0000) GS:ffff88003d600000(0000) knlGS:0000000000000000
[ 235.238199] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 235.241693] CR2: 00007f30ec0e18c8 CR3: 000000003ad71000 CR4: 00000000001406f0
[ 235.246447] Stack:
[ 235.248025] ffff88003ad5e420 ffff88003ad5e438 0000000000000286 00000000024201ca
[ 235.252431] ffff88003ad8bdb0 ffff88003ad8bc78 ffffffff810c31c5 ffff88003ad5e420
[ 235.256818] ffff880038086380 ffff88003d058468 00000000024201ca ffff88003ad8bcc0
[ 235.261162] Call Trace:
[ 235.262977] [<ffffffff810c31c5>] rwsem_wake+0x55/0xd0
[ 235.266244] [<ffffffff812dba9b>] call_rwsem_wake+0x1b/0x30
[ 235.269821] [<ffffffff810b9b90>] up_read+0x30/0x40
[ 235.272922] [<ffffffff8114637c>] __lock_page_or_retry+0x6c/0xf0
[ 235.276507] [<ffffffff811465ab>] filemap_fault+0x1ab/0x670
[ 235.279893] [<ffffffff81146730>] ? filemap_fault+0x330/0x670
[ 235.283374] [<ffffffffa0245a79>] xfs_filemap_fault+0x39/0x60 [xfs]
[ 235.287093] [<ffffffff811700db>] __do_fault+0x6b/0x120
[ 235.290439] [<ffffffff811768c7>] handle_mm_fault+0xed7/0x16f0
[ 235.294528] [<ffffffff81175a38>] ? handle_mm_fault+0x48/0x16f0
[ 235.298067] [<ffffffff8105af27>] ? __do_page_fault+0x127/0x4d0
[ 235.301731] [<ffffffff8105afb5>] __do_page_fault+0x1b5/0x4d0
[ 235.305507] [<ffffffff8105b300>] do_page_fault+0x30/0x80
[ 235.309014] [<ffffffff8161aa28>] page_fault+0x28/0x30
[ 235.312348] Code: a1 48 8b 78 10 e8 ed 83 fd ff 5b 4c 89 e0 41 5c 41 5d 41 5e 41 5f 5d c3 48 89 c2 31 f6 e9 43 ff ff ff 48 89 fa f0 49 0f c1 14 24 <83> ea 01 0f 84 15 ff ff ff e9 e3 fe ff ff 66 66
66 66 66 2e 0f
[ 235.323155] NMI backtrace for cpu 3
[ 235.325497] CPU: 3 PID: 1602 Comm: tgid=1332 Not tainted 4.6.0-rc6-next-20160506 #124
[ 235.330345] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 235.336438] task: ffff8800287a80c0 ti: ffff8800287ac000 task.ti: ffff8800287ac000
[ 235.340943] RIP: 0010:[<ffffffff81049b5b>] [<ffffffff81049b5b>] __default_send_IPI_dest_field+0x3b/0x60
[ 235.346341] RSP: 0000:ffff88003d6c3be8 EFLAGS: 00000046
[ 235.349666] RAX: 0000000000000400 RBX: 0000000000000003 RCX: 0000000000000003
[ 235.353976] RDX: 0000000000000400 RSI: 0000000000000002 RDI: 0000000006000000
[ 235.358389] RBP: ffff88003d6c3bf8 R08: 000000000000000f R09: 0000000000000000
[ 235.363064] R10: 0000000000000001 R11: 000000000000109f R12: 000000000000a12e
[ 235.367468] R13: ffffffff81d16bc0 R14: 0000000000000002 R15: 0000000000000046
[ 235.371864] FS: 00007f30f147f740(0000) GS:ffff88003d6c0000(0000) knlGS:0000000000000000
[ 235.376555] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 235.380145] CR2: 00007f743df6f218 CR3: 000000003aa1a000 CR4: 00000000001406e0
[ 235.384447] Stack:
[ 235.386016] 000000003d6c3bf8 0000000600000002 ffff88003d6c3c30 ffffffff81049c50
[ 235.390469] 0000000000000001 0000000000000003 ffffffff8104ad90 0000000000000040
[ 235.395174] ffff88003cb93b88 ffff88003d6c3c40 ffffffff8104ada6 ffff88003d6c3c70
[ 235.399489] Call Trace:
[ 235.401306] <IRQ> \x01d [<ffffffff81049c50>] default_send_IPI_mask_sequence_phys+0x50/0xb0
[ 235.406101] [<ffffffff8104ad90>] ? irq_force_complete_move+0x120/0x120
[ 235.410118] [<ffffffff8104ada6>] nmi_raise_cpu_backtrace+0x16/0x20
[ 235.413895] [<ffffffff812d20cf>] nmi_trigger_all_cpu_backtrace+0xff/0x110
[ 235.418023] [<ffffffff8104ade4>] arch_trigger_all_cpu_backtrace+0x14/0x20
[ 235.422167] [<ffffffff813975fe>] sysrq_handle_showallcpus+0xe/0x10
[ 235.425929] [<ffffffff81397d01>] __handle_sysrq+0x131/0x210
[ 235.429379] [<ffffffff81397bd0>] ? __sysrq_get_key_op+0x30/0x30
[ 235.433363] [<ffffffff8139817e>] sysrq_filter+0x36e/0x3b0
[ 235.436672] [<ffffffff81469d52>] input_to_handler+0x52/0x100
[ 235.440317] [<ffffffff8146ba7c>] input_pass_values.part.5+0x1bc/0x260
[ 235.444178] [<ffffffff8146b8c0>] ? input_devices_seq_next+0x20/0x20
[ 235.448137] [<ffffffff8146be7b>] input_handle_event+0xfb/0x4f0
[ 235.451687] [<ffffffff8146c2be>] input_event+0x4e/0x70
[ 235.455004] [<ffffffff81474a0d>] atkbd_interrupt+0x5bd/0x6a0
[ 235.458493] [<ffffffff814670f1>] serio_interrupt+0x41/0x80
[ 235.461836] [<ffffffff81467d0a>] i8042_interrupt+0x1da/0x3a0
[ 235.475800] [<ffffffff810d06a3>] handle_irq_event_percpu+0x33/0x110
[ 235.479609] [<ffffffff810d07b4>] handle_irq_event+0x34/0x60
[ 235.483096] [<ffffffff810d3ac4>] handle_edge_irq+0xa4/0x140
[ 235.486581] [<ffffffff81027c98>] handle_irq+0x18/0x30
[ 235.489883] [<ffffffff8102756e>] do_IRQ+0x5e/0x120
[ 235.493153] [<ffffffff8114ad30>] ? page_alloc_cpu_notify+0x40/0x40
[ 235.496848] [<ffffffff81619896>] common_interrupt+0x96/0x96
[ 235.500232] <EOI> \x01d [<ffffffff8114ad30>] ? page_alloc_cpu_notify+0x40/0x40
[ 235.504513] [<ffffffff810f6c0f>] ? smp_call_function_many+0x1ef/0x240
[ 235.508409] [<ffffffff810f6beb>] ? smp_call_function_many+0x1cb/0x240
[ 235.512457] [<ffffffff8114ad30>] ? page_alloc_cpu_notify+0x40/0x40
[ 235.516123] [<ffffffff810f6d64>] on_each_cpu_mask+0x24/0x90
[ 235.519496] [<ffffffff8114b6da>] drain_all_pages+0xca/0xe0
[ 235.522973] [<ffffffff8114d5ac>] __alloc_pages_nodemask+0x70c/0xe30
[ 235.526736] [<ffffffff811972d6>] alloc_pages_current+0x96/0x1b0
[ 235.530238] [<ffffffff81142c6d>] __page_cache_alloc+0x12d/0x160
[ 235.533858] [<ffffffff811533a9>] __do_page_cache_readahead+0x109/0x360
[ 235.537651] [<ffffffff8115340b>] ? __do_page_cache_readahead+0x16b/0x360
[ 235.541675] [<ffffffff811435e7>] ? pagecache_get_page+0x27/0x260
[ 235.545568] [<ffffffff8114671b>] filemap_fault+0x31b/0x670
[ 235.548926] [<ffffffffa0251ab0>] ? xfs_ilock+0xd0/0xe0 [xfs]
[ 235.552416] [<ffffffffa0245a79>] xfs_filemap_fault+0x39/0x60 [xfs]
[ 235.556215] [<ffffffff811700db>] __do_fault+0x6b/0x120
[ 235.559306] [<ffffffff811768c7>] handle_mm_fault+0xed7/0x16f0
[ 235.562962] [<ffffffff81175a38>] ? handle_mm_fault+0x48/0x16f0
[ 235.566573] [<ffffffff8105af27>] ? __do_page_fault+0x127/0x4d0
[ 235.570058] [<ffffffff8105afb5>] __do_page_fault+0x1b5/0x4d0
[ 235.573929] [<ffffffff8105b300>] do_page_fault+0x30/0x80
[ 235.577161] [<ffffffff8161aa28>] page_fault+0x28/0x30
[ 235.580186] Code: 90 8b 04 25 00 d3 5f ff f6 c4 10 75 f2 c1 e7 18 89 3c 25 10 d3 5f ff 89 f0 09 d0 80 ce 04 83 fe 02 0f 44 c2 89 04 25 00 d3 5f ff <c9> c3 48 8b 05 e4 55 cc 00 89 55 f4 89 75 f8 89
7d fc ff 90 18
----------------------------------------
----------------------------------------
00000000000f4810 <smp_call_function_many>:
* hardware interrupt handler or from a bottom half handler. Preemption
* must be disabled when calling this function.
*/
void smp_call_function_many(const struct cpumask *mask,
smp_call_func_t func, void *info, bool wait)
(...snipped...)
static inline void arch_send_call_function_ipi_mask(const struct cpumask *mask)
{
smp_ops.send_call_func_ipi(mask);
f49ac: 49 8b 7d 08 mov 0x8(%r13),%rdi
f49b0: ff 15 00 00 00 00 callq *0x0(%rip) # f49b6 <smp_call_function_many+0x1a6>
}
/* Send a message to all CPUs in the map */
arch_send_call_function_ipi_mask(cfd->cpumask);
if (wait) {
f49b6: 80 7d d0 00 cmpb $0x0,-0x30(%rbp)
f49ba: ba ff ff ff ff mov $0xffffffff,%edx
f49bf: 0f 84 bd fe ff ff je f4882 <smp_call_function_many+0x72>
f49c5: 48 63 35 00 00 00 00 movslq 0x0(%rip),%rsi # f49cc <smp_call_function_many+0x1bc>
f49cc: 49 8b 7d 08 mov 0x8(%r13),%rdi
f49d0: 83 c2 01 add $0x1,%edx
f49d3: 48 63 d2 movslq %edx,%rdx
f49d6: e8 00 00 00 00 callq f49db <smp_call_function_many+0x1cb>
for_each_cpu(cpu, cfd->cpumask) {
f49db: 3b 05 00 00 00 00 cmp 0x0(%rip),%eax # f49e1 <smp_call_function_many+0x1d1>
f49e1: 89 c2 mov %eax,%edx
f49e3: 0f 8d 99 fe ff ff jge f4882 <smp_call_function_many+0x72>
struct call_single_data *csd;
csd = per_cpu_ptr(cfd->csd, cpu);
f49e9: 48 98 cltq
f49eb: 49 8b 4d 00 mov 0x0(%r13),%rcx
f49ef: 48 03 0c c5 00 00 00 add 0x0(,%rax,8),%rcx
f49f6: 00
* previous function call. For multi-cpu calls its even more interesting
* as we'll have to ensure no other cpu is observing our csd.
*/
static __always_inline void csd_lock_wait(struct call_single_data *csd)
{
smp_cond_acquire(!(csd->flags & CSD_FLAG_LOCK));
f49f7: f6 41 18 01 testb $0x1,0x18(%rcx)
f49fb: 74 08 je f4a05 <smp_call_function_many+0x1f5>
f49fd: f3 90 pause
f49ff: f6 41 18 01 testb $0x1,0x18(%rcx) /***** smp_call_function_many+0x1ef is here *****/
f4a03: 75 f8 jne f49fd <smp_call_function_many+0x1ed>
f4a05: eb be jmp f49c5 <smp_call_function_many+0x1b5>
&& !oops_in_progress && !early_boot_irqs_disabled);
/* Try to fastpath. So, what's a CPU they want? Ignoring this one. */
cpu = cpumask_first_and(mask, cpu_online_mask);
if (cpu == this_cpu)
cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
f4a07: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
f4a0e: 4c 89 fe mov %r15,%rsi
f4a11: 44 89 f7 mov %r14d,%edi
f4a14: e8 00 00 00 00 callq f4a19 <smp_call_function_many+0x209>
f4a19: 41 89 c5 mov %eax,%r13d
f4a1c: e9 58 fe ff ff jmpq f4879 <smp_call_function_many+0x69>
if (next_cpu == this_cpu)
next_cpu = cpumask_next_and(next_cpu, mask, cpu_online_mask);
/* Fastpath: do that cpu by itself. */
----------------------------------------
Source code for OOM-killer torture test.
----------------------------------------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <signal.h>
#include <poll.h>
#include <sched.h>
#include <sys/prctl.h>
#include <sys/wait.h>
static int memory_eater(void *unused)
{
char *buf = NULL;
unsigned long size = 0;
while (1) {
char *tmp = realloc(buf, size + 4096);
if (!tmp)
break;
buf = tmp;
buf[size] = 0;
size += 4096;
size %= 1048576;
}
kill(getpid(), SIGKILL);
return 0;
}
static void child(void)
{
char *stack = malloc(4096 * 2);
char from[128] = { };
char to[128] = { };
const pid_t pid = getpid();
unsigned char prev = 0;
int fd = open("/proc/self/oom_score_adj", O_WRONLY);
write(fd, "1000", 4);
close(fd);
snprintf(from, sizeof(from), "tgid=%u", pid);
prctl(PR_SET_NAME, (unsigned long) from, 0, 0, 0);
srand(pid);
snprintf(from, sizeof(from), "file.%u-0", pid);
fd = open(from, O_WRONLY | O_CREAT, 0600);
if (fd == EOF)
_exit(1);
if (clone(memory_eater, stack + 4096, CLONE_THREAD | CLONE_SIGHAND | CLONE_VM, NULL) == -1)
_exit(1);
while (1) {
const unsigned char next = rand();
snprintf(from, sizeof(from), "file.%u-%u", pid, prev);
snprintf(to, sizeof(to), "file.%u-%u", pid, next);
prev = next;
rename(from, to);
write(fd, "", 1);
}
_exit(0);
}
int main(int argc, char *argv[])
{
if (chdir("/tmp"))
return 1;
if (fork() == 0) {
char *buf = NULL;
unsigned long size;
unsigned long i;
for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) {
char *cp = realloc(buf, size);
if (!cp) {
size >>= 1;
break;
}
buf = cp;
}
/* Will cause OOM due to overcommit */
for (i = 0; i < size; i += 4096)
buf[i] = 0;
while (1)
pause();
} else {
int children = 1024;
while (1) {
while (children > 0) {
switch (fork()) {
case 0:
child();
case -1:
sleep(1);
break;
default:
children--;
}
}
wait(NULL);
children++;
}
}
return 0;
}
----------------------------------------
^ permalink raw reply [flat|nested] 19+ messages in thread
* [next-20160506 mm,smp] hang up at csd_lock_wait() from drain_all_pages()
@ 2016-05-06 10:58 ` Tetsuo Handa
0 siblings, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2016-05-06 10:58 UTC (permalink / raw)
To: peterz, tglx, mingo, akpm, mgorman, mhocko; +Cc: linux-kernel, linux-mm
Hello.
I'm hitting this bug while doing OOM-killer torture test, but I can't tell
whether this is a mm bug. When I hit this bug, SysRq-l shows that multiple
CPUs are spinning at csd_lock_wait() called from on_each_cpu_mask()
called from drain_all_pages(). We can see that drain_all_pages() is using
unprotected global variable
static cpumask_t cpus_with_pcps;
when calling on_each_cpu_mask().
Question 1:
Is on_each_cpu_mask() safe if struct cpumask argument (in this case
cpus_with_pcps) is modified by other CPUs?
Question 2:
Is it legal to call smp_call_func_t callback (in this case drain_local_pages())
concurrently which could contend on same spinlock (in this case zone->lock
in free_pcppages_bulk() called from drain_pages_zone() called from
drain_local_pages())?
Question 3:
Are on_each_cpu_mask(mask1, func1, info1, true) and on_each_cpu_mask(mask2,
func2, info2, true) safe each other when they are called at the same time?
(In other words, do we need to guarantee that on_each_cpu_mask() calls with
wait == true are globally serialized?)
Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160506.txt.xz .
----------------------------------------
[ 234.861274] sysrq: SysRq : Show backtrace of all active CPUs
[ 234.864808] Sending NMI to all CPUs:
[ 234.867281] NMI backtrace for cpu 2
[ 234.869576] CPU: 2 PID: 1181 Comm: smbd Not tainted 4.6.0-rc6-next-20160506 #124
[ 234.873806] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 234.879711] task: ffff88003c374000 ti: ffff88003afec000 task.ti: ffff88003afec000
[ 234.884219] RIP: 0010:[<ffffffff810f6c0f>] [<ffffffff810f6c0f>] smp_call_function_many+0x1ef/0x240
[ 234.890500] RSP: 0000:ffff88003afef9d8 EFLAGS: 00000202
[ 234.893838] RAX: 0000000000000000 RBX: ffffffff8114ad30 RCX: ffffe8fffee02360
[ 234.897968] RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff88003fb3a860
[ 234.902089] RBP: ffff88003afefa10 R08: 000000000000000b R09: 0000000000000000
[ 234.906288] R10: 0000000000000001 R11: ffff88003bac9f84 R12: 0000000000000000
[ 234.910464] R13: ffff88003d6987c0 R14: 0000000000000040 R15: 0000000000018780
[ 234.914637] FS: 00007f743eed2840(0000) GS:ffff88003d680000(0000) knlGS:0000000000000000
[ 234.919262] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 234.922700] CR2: 00007f743b68ca00 CR3: 000000003875e000 CR4: 00000000001406e0
[ 234.926923] Stack:
[ 234.928529] 0000000000000002 0100000000000001 ffffffff82ce9c00 ffffffff8114ad30
[ 234.932819] 0000000000000000 0000000000000002 0000000000000000 ffff88003afefa40
[ 234.937150] ffffffff810f6d64 0000000000000040 0000000000000000 0000000000000003
[ 234.941562] Call Trace:
[ 234.943421] [<ffffffff8114ad30>] ? page_alloc_cpu_notify+0x40/0x40
[ 234.947388] [<ffffffff810f6d64>] on_each_cpu_mask+0x24/0x90
[ 234.950884] [<ffffffff8114b6da>] drain_all_pages+0xca/0xe0
[ 234.954296] [<ffffffff8114d5ac>] __alloc_pages_nodemask+0x70c/0xe30
[ 234.958175] [<ffffffff811972d6>] alloc_pages_current+0x96/0x1b0
[ 234.961818] [<ffffffff81142c6d>] __page_cache_alloc+0x12d/0x160
[ 234.965498] [<ffffffff811533a9>] __do_page_cache_readahead+0x109/0x360
[ 234.969434] [<ffffffff8115340b>] ? __do_page_cache_readahead+0x16b/0x360
[ 234.973504] [<ffffffff811435e7>] ? pagecache_get_page+0x27/0x260
[ 234.977185] [<ffffffff8114671b>] filemap_fault+0x31b/0x670
[ 234.980625] [<ffffffffa0251ab0>] ? xfs_ilock+0xd0/0xe0 [xfs]
[ 234.984134] [<ffffffffa0245a79>] xfs_filemap_fault+0x39/0x60 [xfs]
[ 234.988488] [<ffffffff811700db>] __do_fault+0x6b/0x120
[ 234.991775] [<ffffffff811768c7>] handle_mm_fault+0xed7/0x16f0
[ 234.995342] [<ffffffff81175a38>] ? handle_mm_fault+0x48/0x16f0
[ 234.998910] [<ffffffff8105af27>] ? __do_page_fault+0x127/0x4d0
[ 235.002695] [<ffffffff8105afb5>] __do_page_fault+0x1b5/0x4d0
[ 235.006205] [<ffffffff8105b300>] do_page_fault+0x30/0x80
[ 235.009618] [<ffffffff8161aa28>] page_fault+0x28/0x30
[ 235.012798] Code: 63 d2 e8 65 bb 1e 00 3b 05 93 b7 c1 00 89 c2 0f 8d 99 fe ff ff 48 98 49 8b 4d 00 48 03 0c c5 e0 ed cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 eb be 48 c7 c2 80 1a d1 81 4c
89 fe 44 89 f7
[ 235.024075] NMI backtrace for cpu 1
[ 235.026474] CPU: 1 PID: 398 Comm: systemd-journal Not tainted 4.6.0-rc6-next-20160506 #124
[ 235.031317] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 235.039025] task: ffff8800386de080 ti: ffff880036550000 task.ti: ffff880036550000
[ 235.043399] RIP: 0010:[<ffffffff810f6c0f>] [<ffffffff810f6c0f>] smp_call_function_many+0x1ef/0x240
[ 235.048558] RSP: 0000:ffff8800365539d8 EFLAGS: 00000202
[ 235.051821] RAX: 0000000000000000 RBX: ffffffff8114ad30 RCX: ffffe8fffee02340
[ 235.056213] RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff88003fb3a8e0
[ 235.060439] RBP: ffff880036553a10 R08: 000000000000000d R09: 0000000000000000
[ 235.064642] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 235.068834] R13: ffff88003d6587c0 R14: 0000000000000040 R15: 0000000000018780
[ 235.073214] FS: 00007f0362f4f880(0000) GS:ffff88003d640000(0000) knlGS:0000000000000000
[ 235.077848] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 235.081284] CR2: 00007f0361b062e0 CR3: 00000000364d8000 CR4: 00000000001406e0
[ 235.086255] Stack:
[ 235.088069] 0000000000000001 0100000000000001 ffffffff82ce9c00 ffffffff8114ad30
[ 235.092562] 0000000000000000 0000000000000001 0000000000000000 ffff880036553a40
[ 235.096871] ffffffff810f6d64 0000000000000040 0000000000000000 0000000000000003
[ 235.101206] Call Trace:
[ 235.102992] [<ffffffff8114ad30>] ? page_alloc_cpu_notify+0x40/0x40
[ 235.106935] [<ffffffff810f6d64>] on_each_cpu_mask+0x24/0x90
[ 235.110327] [<ffffffff8114b6da>] drain_all_pages+0xca/0xe0
[ 235.113678] [<ffffffff8114d5ac>] __alloc_pages_nodemask+0x70c/0xe30
[ 235.117443] [<ffffffff811972d6>] alloc_pages_current+0x96/0x1b0
[ 235.121231] [<ffffffff81142c6d>] __page_cache_alloc+0x12d/0x160
[ 235.124842] [<ffffffff811533a9>] __do_page_cache_readahead+0x109/0x360
[ 235.128717] [<ffffffff8115340b>] ? __do_page_cache_readahead+0x16b/0x360
[ 235.132676] [<ffffffff811435e7>] ? pagecache_get_page+0x27/0x260
[ 235.138094] [<ffffffff8114671b>] filemap_fault+0x31b/0x670
[ 235.141695] [<ffffffffa0251ab0>] ? xfs_ilock+0xd0/0xe0 [xfs]
[ 235.145120] [<ffffffffa0245a79>] xfs_filemap_fault+0x39/0x60 [xfs]
[ 235.148808] [<ffffffff811700db>] __do_fault+0x6b/0x120
[ 235.151975] [<ffffffff811768c7>] handle_mm_fault+0xed7/0x16f0
[ 235.155636] [<ffffffff81175a38>] ? handle_mm_fault+0x48/0x16f0
[ 235.159145] [<ffffffff8105af27>] ? __do_page_fault+0x127/0x4d0
[ 235.162643] [<ffffffff8105afb5>] __do_page_fault+0x1b5/0x4d0
[ 235.166045] [<ffffffff8105b300>] do_page_fault+0x30/0x80
[ 235.169412] [<ffffffff8161aa28>] page_fault+0x28/0x30
[ 235.172997] Code: 63 d2 e8 65 bb 1e 00 3b 05 93 b7 c1 00 89 c2 0f 8d 99 fe ff ff 48 98 49 8b 4d 00 48 03 0c c5 e0 ed cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 eb be 48 c7 c2 80 1a d1 81 4c
89 fe 44 89 f7
[ 235.184715] NMI backtrace for cpu 0
[ 235.187229] CPU: 0 PID: 1294 Comm: tgid=1294 Not tainted 4.6.0-rc6-next-20160506 #124
[ 235.192548] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 235.198853] task: ffff88003ad82100 ti: ffff88003ad88000 task.ti: ffff88003ad88000
[ 235.203368] RIP: 0010:[<ffffffff810c2d94>] [<ffffffff810c2d94>] __rwsem_do_wake+0x134/0x150
[ 235.208393] RSP: 0000:ffff88003ad8bc20 EFLAGS: 00000097
[ 235.211722] RAX: ffff880028647c00 RBX: ffff88003ad5e420 RCX: ffffffff00000000
[ 235.216326] RDX: fffffffe00000001 RSI: 0000000000000001 RDI: ffffffffffffffff
[ 235.220608] RBP: ffff88003ad8bc48 R08: 0000000000000001 R09: 0000000000000001
[ 235.225009] R10: ffff88003ad82100 R11: ffff88003ad82c98 R12: ffff88003ad5e420
[ 235.229220] R13: 0000000000000286 R14: 00000000024201ca R15: ffff88003ad8bdb0
[ 235.233383] FS: 00007f30f147f740(0000) GS:ffff88003d600000(0000) knlGS:0000000000000000
[ 235.238199] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 235.241693] CR2: 00007f30ec0e18c8 CR3: 000000003ad71000 CR4: 00000000001406f0
[ 235.246447] Stack:
[ 235.248025] ffff88003ad5e420 ffff88003ad5e438 0000000000000286 00000000024201ca
[ 235.252431] ffff88003ad8bdb0 ffff88003ad8bc78 ffffffff810c31c5 ffff88003ad5e420
[ 235.256818] ffff880038086380 ffff88003d058468 00000000024201ca ffff88003ad8bcc0
[ 235.261162] Call Trace:
[ 235.262977] [<ffffffff810c31c5>] rwsem_wake+0x55/0xd0
[ 235.266244] [<ffffffff812dba9b>] call_rwsem_wake+0x1b/0x30
[ 235.269821] [<ffffffff810b9b90>] up_read+0x30/0x40
[ 235.272922] [<ffffffff8114637c>] __lock_page_or_retry+0x6c/0xf0
[ 235.276507] [<ffffffff811465ab>] filemap_fault+0x1ab/0x670
[ 235.279893] [<ffffffff81146730>] ? filemap_fault+0x330/0x670
[ 235.283374] [<ffffffffa0245a79>] xfs_filemap_fault+0x39/0x60 [xfs]
[ 235.287093] [<ffffffff811700db>] __do_fault+0x6b/0x120
[ 235.290439] [<ffffffff811768c7>] handle_mm_fault+0xed7/0x16f0
[ 235.294528] [<ffffffff81175a38>] ? handle_mm_fault+0x48/0x16f0
[ 235.298067] [<ffffffff8105af27>] ? __do_page_fault+0x127/0x4d0
[ 235.301731] [<ffffffff8105afb5>] __do_page_fault+0x1b5/0x4d0
[ 235.305507] [<ffffffff8105b300>] do_page_fault+0x30/0x80
[ 235.309014] [<ffffffff8161aa28>] page_fault+0x28/0x30
[ 235.312348] Code: a1 48 8b 78 10 e8 ed 83 fd ff 5b 4c 89 e0 41 5c 41 5d 41 5e 41 5f 5d c3 48 89 c2 31 f6 e9 43 ff ff ff 48 89 fa f0 49 0f c1 14 24 <83> ea 01 0f 84 15 ff ff ff e9 e3 fe ff ff 66 66
66 66 66 2e 0f
[ 235.323155] NMI backtrace for cpu 3
[ 235.325497] CPU: 3 PID: 1602 Comm: tgid=1332 Not tainted 4.6.0-rc6-next-20160506 #124
[ 235.330345] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 235.336438] task: ffff8800287a80c0 ti: ffff8800287ac000 task.ti: ffff8800287ac000
[ 235.340943] RIP: 0010:[<ffffffff81049b5b>] [<ffffffff81049b5b>] __default_send_IPI_dest_field+0x3b/0x60
[ 235.346341] RSP: 0000:ffff88003d6c3be8 EFLAGS: 00000046
[ 235.349666] RAX: 0000000000000400 RBX: 0000000000000003 RCX: 0000000000000003
[ 235.353976] RDX: 0000000000000400 RSI: 0000000000000002 RDI: 0000000006000000
[ 235.358389] RBP: ffff88003d6c3bf8 R08: 000000000000000f R09: 0000000000000000
[ 235.363064] R10: 0000000000000001 R11: 000000000000109f R12: 000000000000a12e
[ 235.367468] R13: ffffffff81d16bc0 R14: 0000000000000002 R15: 0000000000000046
[ 235.371864] FS: 00007f30f147f740(0000) GS:ffff88003d6c0000(0000) knlGS:0000000000000000
[ 235.376555] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 235.380145] CR2: 00007f743df6f218 CR3: 000000003aa1a000 CR4: 00000000001406e0
[ 235.384447] Stack:
[ 235.386016] 000000003d6c3bf8 0000000600000002 ffff88003d6c3c30 ffffffff81049c50
[ 235.390469] 0000000000000001 0000000000000003 ffffffff8104ad90 0000000000000040
[ 235.395174] ffff88003cb93b88 ffff88003d6c3c40 ffffffff8104ada6 ffff88003d6c3c70
[ 235.399489] Call Trace:
[ 235.401306] <IRQ> \x01d [<ffffffff81049c50>] default_send_IPI_mask_sequence_phys+0x50/0xb0
[ 235.406101] [<ffffffff8104ad90>] ? irq_force_complete_move+0x120/0x120
[ 235.410118] [<ffffffff8104ada6>] nmi_raise_cpu_backtrace+0x16/0x20
[ 235.413895] [<ffffffff812d20cf>] nmi_trigger_all_cpu_backtrace+0xff/0x110
[ 235.418023] [<ffffffff8104ade4>] arch_trigger_all_cpu_backtrace+0x14/0x20
[ 235.422167] [<ffffffff813975fe>] sysrq_handle_showallcpus+0xe/0x10
[ 235.425929] [<ffffffff81397d01>] __handle_sysrq+0x131/0x210
[ 235.429379] [<ffffffff81397bd0>] ? __sysrq_get_key_op+0x30/0x30
[ 235.433363] [<ffffffff8139817e>] sysrq_filter+0x36e/0x3b0
[ 235.436672] [<ffffffff81469d52>] input_to_handler+0x52/0x100
[ 235.440317] [<ffffffff8146ba7c>] input_pass_values.part.5+0x1bc/0x260
[ 235.444178] [<ffffffff8146b8c0>] ? input_devices_seq_next+0x20/0x20
[ 235.448137] [<ffffffff8146be7b>] input_handle_event+0xfb/0x4f0
[ 235.451687] [<ffffffff8146c2be>] input_event+0x4e/0x70
[ 235.455004] [<ffffffff81474a0d>] atkbd_interrupt+0x5bd/0x6a0
[ 235.458493] [<ffffffff814670f1>] serio_interrupt+0x41/0x80
[ 235.461836] [<ffffffff81467d0a>] i8042_interrupt+0x1da/0x3a0
[ 235.475800] [<ffffffff810d06a3>] handle_irq_event_percpu+0x33/0x110
[ 235.479609] [<ffffffff810d07b4>] handle_irq_event+0x34/0x60
[ 235.483096] [<ffffffff810d3ac4>] handle_edge_irq+0xa4/0x140
[ 235.486581] [<ffffffff81027c98>] handle_irq+0x18/0x30
[ 235.489883] [<ffffffff8102756e>] do_IRQ+0x5e/0x120
[ 235.493153] [<ffffffff8114ad30>] ? page_alloc_cpu_notify+0x40/0x40
[ 235.496848] [<ffffffff81619896>] common_interrupt+0x96/0x96
[ 235.500232] <EOI> \x01d [<ffffffff8114ad30>] ? page_alloc_cpu_notify+0x40/0x40
[ 235.504513] [<ffffffff810f6c0f>] ? smp_call_function_many+0x1ef/0x240
[ 235.508409] [<ffffffff810f6beb>] ? smp_call_function_many+0x1cb/0x240
[ 235.512457] [<ffffffff8114ad30>] ? page_alloc_cpu_notify+0x40/0x40
[ 235.516123] [<ffffffff810f6d64>] on_each_cpu_mask+0x24/0x90
[ 235.519496] [<ffffffff8114b6da>] drain_all_pages+0xca/0xe0
[ 235.522973] [<ffffffff8114d5ac>] __alloc_pages_nodemask+0x70c/0xe30
[ 235.526736] [<ffffffff811972d6>] alloc_pages_current+0x96/0x1b0
[ 235.530238] [<ffffffff81142c6d>] __page_cache_alloc+0x12d/0x160
[ 235.533858] [<ffffffff811533a9>] __do_page_cache_readahead+0x109/0x360
[ 235.537651] [<ffffffff8115340b>] ? __do_page_cache_readahead+0x16b/0x360
[ 235.541675] [<ffffffff811435e7>] ? pagecache_get_page+0x27/0x260
[ 235.545568] [<ffffffff8114671b>] filemap_fault+0x31b/0x670
[ 235.548926] [<ffffffffa0251ab0>] ? xfs_ilock+0xd0/0xe0 [xfs]
[ 235.552416] [<ffffffffa0245a79>] xfs_filemap_fault+0x39/0x60 [xfs]
[ 235.556215] [<ffffffff811700db>] __do_fault+0x6b/0x120
[ 235.559306] [<ffffffff811768c7>] handle_mm_fault+0xed7/0x16f0
[ 235.562962] [<ffffffff81175a38>] ? handle_mm_fault+0x48/0x16f0
[ 235.566573] [<ffffffff8105af27>] ? __do_page_fault+0x127/0x4d0
[ 235.570058] [<ffffffff8105afb5>] __do_page_fault+0x1b5/0x4d0
[ 235.573929] [<ffffffff8105b300>] do_page_fault+0x30/0x80
[ 235.577161] [<ffffffff8161aa28>] page_fault+0x28/0x30
[ 235.580186] Code: 90 8b 04 25 00 d3 5f ff f6 c4 10 75 f2 c1 e7 18 89 3c 25 10 d3 5f ff 89 f0 09 d0 80 ce 04 83 fe 02 0f 44 c2 89 04 25 00 d3 5f ff <c9> c3 48 8b 05 e4 55 cc 00 89 55 f4 89 75 f8 89
7d fc ff 90 18
----------------------------------------
----------------------------------------
00000000000f4810 <smp_call_function_many>:
* hardware interrupt handler or from a bottom half handler. Preemption
* must be disabled when calling this function.
*/
void smp_call_function_many(const struct cpumask *mask,
smp_call_func_t func, void *info, bool wait)
(...snipped...)
static inline void arch_send_call_function_ipi_mask(const struct cpumask *mask)
{
smp_ops.send_call_func_ipi(mask);
f49ac: 49 8b 7d 08 mov 0x8(%r13),%rdi
f49b0: ff 15 00 00 00 00 callq *0x0(%rip) # f49b6 <smp_call_function_many+0x1a6>
}
/* Send a message to all CPUs in the map */
arch_send_call_function_ipi_mask(cfd->cpumask);
if (wait) {
f49b6: 80 7d d0 00 cmpb $0x0,-0x30(%rbp)
f49ba: ba ff ff ff ff mov $0xffffffff,%edx
f49bf: 0f 84 bd fe ff ff je f4882 <smp_call_function_many+0x72>
f49c5: 48 63 35 00 00 00 00 movslq 0x0(%rip),%rsi # f49cc <smp_call_function_many+0x1bc>
f49cc: 49 8b 7d 08 mov 0x8(%r13),%rdi
f49d0: 83 c2 01 add $0x1,%edx
f49d3: 48 63 d2 movslq %edx,%rdx
f49d6: e8 00 00 00 00 callq f49db <smp_call_function_many+0x1cb>
for_each_cpu(cpu, cfd->cpumask) {
f49db: 3b 05 00 00 00 00 cmp 0x0(%rip),%eax # f49e1 <smp_call_function_many+0x1d1>
f49e1: 89 c2 mov %eax,%edx
f49e3: 0f 8d 99 fe ff ff jge f4882 <smp_call_function_many+0x72>
struct call_single_data *csd;
csd = per_cpu_ptr(cfd->csd, cpu);
f49e9: 48 98 cltq
f49eb: 49 8b 4d 00 mov 0x0(%r13),%rcx
f49ef: 48 03 0c c5 00 00 00 add 0x0(,%rax,8),%rcx
f49f6: 00
* previous function call. For multi-cpu calls its even more interesting
* as we'll have to ensure no other cpu is observing our csd.
*/
static __always_inline void csd_lock_wait(struct call_single_data *csd)
{
smp_cond_acquire(!(csd->flags & CSD_FLAG_LOCK));
f49f7: f6 41 18 01 testb $0x1,0x18(%rcx)
f49fb: 74 08 je f4a05 <smp_call_function_many+0x1f5>
f49fd: f3 90 pause
f49ff: f6 41 18 01 testb $0x1,0x18(%rcx) /***** smp_call_function_many+0x1ef is here *****/
f4a03: 75 f8 jne f49fd <smp_call_function_many+0x1ed>
f4a05: eb be jmp f49c5 <smp_call_function_many+0x1b5>
&& !oops_in_progress && !early_boot_irqs_disabled);
/* Try to fastpath. So, what's a CPU they want? Ignoring this one. */
cpu = cpumask_first_and(mask, cpu_online_mask);
if (cpu == this_cpu)
cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
f4a07: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
f4a0e: 4c 89 fe mov %r15,%rsi
f4a11: 44 89 f7 mov %r14d,%edi
f4a14: e8 00 00 00 00 callq f4a19 <smp_call_function_many+0x209>
f4a19: 41 89 c5 mov %eax,%r13d
f4a1c: e9 58 fe ff ff jmpq f4879 <smp_call_function_many+0x69>
if (next_cpu == this_cpu)
next_cpu = cpumask_next_and(next_cpu, mask, cpu_online_mask);
/* Fastpath: do that cpu by itself. */
----------------------------------------
Source code for OOM-killer torture test.
----------------------------------------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <signal.h>
#include <poll.h>
#include <sched.h>
#include <sys/prctl.h>
#include <sys/wait.h>
static int memory_eater(void *unused)
{
char *buf = NULL;
unsigned long size = 0;
while (1) {
char *tmp = realloc(buf, size + 4096);
if (!tmp)
break;
buf = tmp;
buf[size] = 0;
size += 4096;
size %= 1048576;
}
kill(getpid(), SIGKILL);
return 0;
}
static void child(void)
{
char *stack = malloc(4096 * 2);
char from[128] = { };
char to[128] = { };
const pid_t pid = getpid();
unsigned char prev = 0;
int fd = open("/proc/self/oom_score_adj", O_WRONLY);
write(fd, "1000", 4);
close(fd);
snprintf(from, sizeof(from), "tgid=%u", pid);
prctl(PR_SET_NAME, (unsigned long) from, 0, 0, 0);
srand(pid);
snprintf(from, sizeof(from), "file.%u-0", pid);
fd = open(from, O_WRONLY | O_CREAT, 0600);
if (fd == EOF)
_exit(1);
if (clone(memory_eater, stack + 4096, CLONE_THREAD | CLONE_SIGHAND | CLONE_VM, NULL) == -1)
_exit(1);
while (1) {
const unsigned char next = rand();
snprintf(from, sizeof(from), "file.%u-%u", pid, prev);
snprintf(to, sizeof(to), "file.%u-%u", pid, next);
prev = next;
rename(from, to);
write(fd, "", 1);
}
_exit(0);
}
int main(int argc, char *argv[])
{
if (chdir("/tmp"))
return 1;
if (fork() == 0) {
char *buf = NULL;
unsigned long size;
unsigned long i;
for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) {
char *cp = realloc(buf, size);
if (!cp) {
size >>= 1;
break;
}
buf = cp;
}
/* Will cause OOM due to overcommit */
for (i = 0; i < size; i += 4096)
buf[i] = 0;
while (1)
pause();
} else {
int children = 1024;
while (1) {
while (children > 0) {
switch (fork()) {
case 0:
child();
case -1:
sleep(1);
break;
default:
children--;
}
}
wait(NULL);
children++;
}
}
return 0;
}
----------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* x86_64 Question: Are concurrent IPI requests safe?
2016-05-06 10:58 ` Tetsuo Handa
@ 2016-05-09 14:54 ` Tetsuo Handa
-1 siblings, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2016-05-09 14:54 UTC (permalink / raw)
To: peterz, tglx, mingo, akpm, mgorman, mhocko; +Cc: linux-kernel, linux-mm
Tetsuo Handa wrote:
> I'm hitting this bug while doing OOM-killer torture test, but I can't tell
> whether this is a mm bug.
I came to think that this is an IPI handling bug (including the possibility
of hardware dependent bug).
If I understand correctly, all "call a function with an argument" requests
in a per CPU call_single_queue are processed by one CALL_FUNCTION_VECTOR IPI
request.
----------------------------------------
void smp_call_function_many(const struct cpumask *mask, smp_call_func_t func, void *info, bool wait) { /* kernel/smp.c */
for_each_cpu(cpu, cfd->cpumask) {
llist_add(&csd->llist, &per_cpu(call_single_queue, cpu));
}
arch_send_call_function_ipi_mask(cfd->cpumask) { /* arch/x86/include/asm/smp.h */
smp_ops.send_call_func_ipi(mask) {
void native_send_call_func_ipi(const struct cpumask *mask) { /* arch/x86/kernel/smp.c */
apic->send_IPI_mask(mask, CALL_FUNCTION_VECTOR) {
static void flat_send_IPI_mask(const struct cpumask *cpumask, int vector) { /* arch/x86/kernel/apic/apic_flat_64.c */
_flat_send_IPI_mask(mask, vector) {
__default_send_IPI_dest_field(unsigned int mask, int vector, unsigned int dest) { /* arch/x86/kernel/apic/ipi.c */
__xapic_wait_icr_idle();
native_apic_mem_write(APIC_ICR2, cfg);
native_apic_mem_write(APIC_ICR, cfg);
}
}
}
}
}
}
}
for_each_cpu(cpu, cfd->cpumask) {
csd_lock_wait(csd);
}
}
----------------------------------------
----------------------------------------
__visible void smp_call_function_interrupt(struct pt_regs *regs) { /* arch/x86/kernel/smp.c */
ipi_entering_ack_irq();
__smp_call_function_interrupt() {
generic_smp_call_function_interrupt() {
void generic_smp_call_function_single_interrupt(void) { /* kernel/smp.c */
flush_smp_call_function_queue(true) {
head = this_cpu_ptr(&call_single_queue);
entry = llist_del_all(head);
llist_for_each_entry_safe(csd, csd_next, entry, llist) {
func(info);
csd_unlock(csd);
}
}
}
}
}
exiting_irq();
}
----------------------------------------
Therefore, concurrent on_each_cpu_mask() calls with wait == true does not matter
as long as llist_del_all() is called from smp_call_function_interrupt()
after llist_add() is called from smp_call_function_many().
Since the SysRq-l shows that multiple CPUs are spinning at csd_lock_wait(),
I checked whether llist_del_all() is called after llist_add() is called
using debug printk() patch shown below.
----------------------------------------
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index e513940..6eb9f79 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -237,6 +237,8 @@ static DECLARE_WORK(sysrq_showallcpus, sysrq_showregs_othercpus);
static void sysrq_handle_showallcpus(int key)
{
+ extern void show_ipi_sequence(void);
+ show_ipi_sequence();
/*
* Fall back to the workqueue based printing if the
* backtrace printing did not succeed or the
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index afdcb7b..3744946 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -56,6 +56,7 @@
#include <linux/random.h>
#include <linux/trace_events.h>
#include <linux/suspend.h>
+extern void show_ipi_sequence(void);
#include "tree.h"
#include "rcu.h"
@@ -1346,6 +1347,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
* See Documentation/RCU/stallwarn.txt for info on how to debug
* RCU CPU stall warnings.
*/
+ show_ipi_sequence();
pr_err("INFO: %s detected stalls on CPUs/tasks:",
rsp->name);
print_cpu_stall_info_begin();
@@ -1412,6 +1414,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
* See Documentation/RCU/stallwarn.txt for info on how to debug
* RCU CPU stall warnings.
*/
+ show_ipi_sequence();
pr_err("INFO: %s self-detected stall on CPU", rsp->name);
print_cpu_stall_info_begin();
print_cpu_stall_info(rsp, smp_processor_id());
diff --git a/kernel/smp.c b/kernel/smp.c
index 7416544..0630eda 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -17,6 +17,21 @@
#include "smpboot.h"
+static DEFINE_PER_CPU(unsigned int, ipi_call_func_sequence);
+static DEFINE_PER_CPU(unsigned int, ipi_call_func_last_sequence[4]);
+
+void show_ipi_sequence(void)
+{
+ int cpu;
+ for_each_online_cpu(cpu) {
+ printk(KERN_ERR "IPI(%d): last_requested=%u,%u last_responded=%u,%u\n",
+ cpu, per_cpu_ptr(ipi_call_func_last_sequence, cpu)[0],
+ per_cpu_ptr(ipi_call_func_last_sequence, cpu)[2],
+ per_cpu_ptr(ipi_call_func_last_sequence, cpu)[1],
+ per_cpu_ptr(ipi_call_func_last_sequence, cpu)[3]);
+ }
+}
+
enum {
CSD_FLAG_LOCK = 0x01,
CSD_FLAG_SYNCHRONOUS = 0x02,
@@ -218,7 +233,9 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
WARN_ON(!irqs_disabled());
head = this_cpu_ptr(&call_single_queue);
+ this_cpu_write(ipi_call_func_last_sequence[1], this_cpu_inc_return(ipi_call_func_sequence));
entry = llist_del_all(head);
+ this_cpu_write(ipi_call_func_last_sequence[3], this_cpu_inc_return(ipi_call_func_sequence));
entry = llist_reverse_order(entry);
/* There shouldn't be any pending callbacks on an offline CPU. */
@@ -452,7 +469,9 @@ void smp_call_function_many(const struct cpumask *mask,
csd->flags |= CSD_FLAG_SYNCHRONOUS;
csd->func = func;
csd->info = info;
+ this_cpu_write(ipi_call_func_last_sequence[0], this_cpu_inc_return(ipi_call_func_sequence));
llist_add(&csd->llist, &per_cpu(call_single_queue, cpu));
+ this_cpu_write(ipi_call_func_last_sequence[2], this_cpu_inc_return(ipi_call_func_sequence));
}
/* Send a message to all CPUs in the map */
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 2932f3e9..deb3243 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4404,6 +4404,8 @@ void show_workqueue_state(void)
struct worker_pool *pool;
unsigned long flags;
int pi;
+ extern void show_ipi_sequence(void);
+ show_ipi_sequence();
rcu_read_lock_sched();
----------------------------------------
This debug printk() patch shows sequence numbers using
IPI($cpu): last_requested=$v0,$v2 last_responded=$v1,$v3
format and it is expected that $v0 == $v2 - 1 && $v1 == $v3 - 1 && $v2 < $v1
for most of the time.
However, I sometimes get $v2 > $v1 when I hit this bug.
(Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160509.txt.xz .)
----------------------------------------
[ 120.469451] IPI(0): last_requested=21491,21492 last_responded=21493,21494
[ 120.471484] IPI(1): last_requested=10485,10486 last_responded=10489,10490
[ 120.473622] IPI(2): last_requested=11117,11118 last_responded=11111,11112
[ 120.476205] IPI(3): last_requested=9663,9664 last_responded=9667,9668
[ 133.889461] IPI(0): last_requested=21491,21492 last_responded=21493,21494
[ 133.891353] IPI(1): last_requested=10485,10486 last_responded=10489,10490
[ 133.893211] IPI(2): last_requested=11117,11118 last_responded=11111,11112
[ 133.895008] IPI(3): last_requested=9663,9664 last_responded=9667,9668
[ 150.550605] IPI(0): last_requested=21491,21492 last_responded=21493,21494
[ 150.552350] IPI(1): last_requested=10485,10486 last_responded=10489,10490
[ 150.554114] IPI(2): last_requested=11117,11118 last_responded=11111,11112
[ 150.555865] IPI(3): last_requested=9663,9664 last_responded=9667,9668
----------------------------------------
This means that llist_del_all() is NOT called after llist_add() is called on
some CPUs. This suggests that IPI requests are dropped (or cannot be processed)
for some reason.
Now, I got a question about __default_send_IPI_dest_field().
----------------------------------------
void __default_send_IPI_dest_field(unsigned int mask, int vector, unsigned int dest)
{
unsigned long cfg;
/*
* Wait for idle.
*/
if (unlikely(vector == NMI_VECTOR))
safe_apic_wait_icr_idle();
else
__xapic_wait_icr_idle();
/*
* prepare target chip field
*/
cfg = __prepare_ICR2(mask);
native_apic_mem_write(APIC_ICR2, cfg);
/*
* program the ICR
*/
cfg = __prepare_ICR(0, vector, dest);
/*
* Send the IPI. The write to APIC_ICR fires this off.
*/
native_apic_mem_write(APIC_ICR, cfg);
}
static inline void __xapic_wait_icr_idle(void)
{
while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
cpu_relax();
}
static inline u32 native_apic_mem_read(u32 reg)
{
return *((volatile u32 *)(APIC_BASE + reg));
}
static inline void native_apic_mem_write(u32 reg, u32 v)
{
volatile u32 *addr = (volatile u32 *)(APIC_BASE + reg);
alternative_io("movl %0, %P1", "xchgl %0, %P1", X86_BUG_11AP,
ASM_OUTPUT2("=r" (v), "=m" (*addr)),
ASM_OUTPUT2("0" (v), "m" (*addr)));
}
----------------------------------------
It seems to me that APIC_BASE APIC_ICR APIC_ICR_BUSY are all constant
regardless of calling cpu. Thus, native_apic_mem_read() and
native_apic_mem_write() are using globally shared constant memory
address and __xapic_wait_icr_idle() is making decision based on
globally shared constant memory address. Am I right?
Then, what happens if 2 CPUs called native_apic_mem_write(APIC_ICR),
one with vector == CALL_FUNCTION_VECTOR and the other with
vector != CALL_FUNCTION_VECTOR ? Since I can't find exclusion control
between CPUs here, native_apic_mem_write() for CALL_FUNCTION_VECTOR
can be ignored by concurrent !CALL_FUNCTION_VECTOR request?
^ permalink raw reply related [flat|nested] 19+ messages in thread
* x86_64 Question: Are concurrent IPI requests safe?
@ 2016-05-09 14:54 ` Tetsuo Handa
0 siblings, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2016-05-09 14:54 UTC (permalink / raw)
To: peterz, tglx, mingo, akpm, mgorman, mhocko; +Cc: linux-kernel, linux-mm
Tetsuo Handa wrote:
> I'm hitting this bug while doing OOM-killer torture test, but I can't tell
> whether this is a mm bug.
I came to think that this is an IPI handling bug (including the possibility
of hardware dependent bug).
If I understand correctly, all "call a function with an argument" requests
in a per CPU call_single_queue are processed by one CALL_FUNCTION_VECTOR IPI
request.
----------------------------------------
void smp_call_function_many(const struct cpumask *mask, smp_call_func_t func, void *info, bool wait) { /* kernel/smp.c */
for_each_cpu(cpu, cfd->cpumask) {
llist_add(&csd->llist, &per_cpu(call_single_queue, cpu));
}
arch_send_call_function_ipi_mask(cfd->cpumask) { /* arch/x86/include/asm/smp.h */
smp_ops.send_call_func_ipi(mask) {
void native_send_call_func_ipi(const struct cpumask *mask) { /* arch/x86/kernel/smp.c */
apic->send_IPI_mask(mask, CALL_FUNCTION_VECTOR) {
static void flat_send_IPI_mask(const struct cpumask *cpumask, int vector) { /* arch/x86/kernel/apic/apic_flat_64.c */
_flat_send_IPI_mask(mask, vector) {
__default_send_IPI_dest_field(unsigned int mask, int vector, unsigned int dest) { /* arch/x86/kernel/apic/ipi.c */
__xapic_wait_icr_idle();
native_apic_mem_write(APIC_ICR2, cfg);
native_apic_mem_write(APIC_ICR, cfg);
}
}
}
}
}
}
}
for_each_cpu(cpu, cfd->cpumask) {
csd_lock_wait(csd);
}
}
----------------------------------------
----------------------------------------
__visible void smp_call_function_interrupt(struct pt_regs *regs) { /* arch/x86/kernel/smp.c */
ipi_entering_ack_irq();
__smp_call_function_interrupt() {
generic_smp_call_function_interrupt() {
void generic_smp_call_function_single_interrupt(void) { /* kernel/smp.c */
flush_smp_call_function_queue(true) {
head = this_cpu_ptr(&call_single_queue);
entry = llist_del_all(head);
llist_for_each_entry_safe(csd, csd_next, entry, llist) {
func(info);
csd_unlock(csd);
}
}
}
}
}
exiting_irq();
}
----------------------------------------
Therefore, concurrent on_each_cpu_mask() calls with wait == true does not matter
as long as llist_del_all() is called from smp_call_function_interrupt()
after llist_add() is called from smp_call_function_many().
Since the SysRq-l shows that multiple CPUs are spinning at csd_lock_wait(),
I checked whether llist_del_all() is called after llist_add() is called
using debug printk() patch shown below.
----------------------------------------
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index e513940..6eb9f79 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -237,6 +237,8 @@ static DECLARE_WORK(sysrq_showallcpus, sysrq_showregs_othercpus);
static void sysrq_handle_showallcpus(int key)
{
+ extern void show_ipi_sequence(void);
+ show_ipi_sequence();
/*
* Fall back to the workqueue based printing if the
* backtrace printing did not succeed or the
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index afdcb7b..3744946 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -56,6 +56,7 @@
#include <linux/random.h>
#include <linux/trace_events.h>
#include <linux/suspend.h>
+extern void show_ipi_sequence(void);
#include "tree.h"
#include "rcu.h"
@@ -1346,6 +1347,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
* See Documentation/RCU/stallwarn.txt for info on how to debug
* RCU CPU stall warnings.
*/
+ show_ipi_sequence();
pr_err("INFO: %s detected stalls on CPUs/tasks:",
rsp->name);
print_cpu_stall_info_begin();
@@ -1412,6 +1414,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
* See Documentation/RCU/stallwarn.txt for info on how to debug
* RCU CPU stall warnings.
*/
+ show_ipi_sequence();
pr_err("INFO: %s self-detected stall on CPU", rsp->name);
print_cpu_stall_info_begin();
print_cpu_stall_info(rsp, smp_processor_id());
diff --git a/kernel/smp.c b/kernel/smp.c
index 7416544..0630eda 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -17,6 +17,21 @@
#include "smpboot.h"
+static DEFINE_PER_CPU(unsigned int, ipi_call_func_sequence);
+static DEFINE_PER_CPU(unsigned int, ipi_call_func_last_sequence[4]);
+
+void show_ipi_sequence(void)
+{
+ int cpu;
+ for_each_online_cpu(cpu) {
+ printk(KERN_ERR "IPI(%d): last_requested=%u,%u last_responded=%u,%u\n",
+ cpu, per_cpu_ptr(ipi_call_func_last_sequence, cpu)[0],
+ per_cpu_ptr(ipi_call_func_last_sequence, cpu)[2],
+ per_cpu_ptr(ipi_call_func_last_sequence, cpu)[1],
+ per_cpu_ptr(ipi_call_func_last_sequence, cpu)[3]);
+ }
+}
+
enum {
CSD_FLAG_LOCK = 0x01,
CSD_FLAG_SYNCHRONOUS = 0x02,
@@ -218,7 +233,9 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
WARN_ON(!irqs_disabled());
head = this_cpu_ptr(&call_single_queue);
+ this_cpu_write(ipi_call_func_last_sequence[1], this_cpu_inc_return(ipi_call_func_sequence));
entry = llist_del_all(head);
+ this_cpu_write(ipi_call_func_last_sequence[3], this_cpu_inc_return(ipi_call_func_sequence));
entry = llist_reverse_order(entry);
/* There shouldn't be any pending callbacks on an offline CPU. */
@@ -452,7 +469,9 @@ void smp_call_function_many(const struct cpumask *mask,
csd->flags |= CSD_FLAG_SYNCHRONOUS;
csd->func = func;
csd->info = info;
+ this_cpu_write(ipi_call_func_last_sequence[0], this_cpu_inc_return(ipi_call_func_sequence));
llist_add(&csd->llist, &per_cpu(call_single_queue, cpu));
+ this_cpu_write(ipi_call_func_last_sequence[2], this_cpu_inc_return(ipi_call_func_sequence));
}
/* Send a message to all CPUs in the map */
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 2932f3e9..deb3243 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4404,6 +4404,8 @@ void show_workqueue_state(void)
struct worker_pool *pool;
unsigned long flags;
int pi;
+ extern void show_ipi_sequence(void);
+ show_ipi_sequence();
rcu_read_lock_sched();
----------------------------------------
This debug printk() patch shows sequence numbers using
IPI($cpu): last_requested=$v0,$v2 last_responded=$v1,$v3
format and it is expected that $v0 == $v2 - 1 && $v1 == $v3 - 1 && $v2 < $v1
for most of the time.
However, I sometimes get $v2 > $v1 when I hit this bug.
(Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160509.txt.xz .)
----------------------------------------
[ 120.469451] IPI(0): last_requested=21491,21492 last_responded=21493,21494
[ 120.471484] IPI(1): last_requested=10485,10486 last_responded=10489,10490
[ 120.473622] IPI(2): last_requested=11117,11118 last_responded=11111,11112
[ 120.476205] IPI(3): last_requested=9663,9664 last_responded=9667,9668
[ 133.889461] IPI(0): last_requested=21491,21492 last_responded=21493,21494
[ 133.891353] IPI(1): last_requested=10485,10486 last_responded=10489,10490
[ 133.893211] IPI(2): last_requested=11117,11118 last_responded=11111,11112
[ 133.895008] IPI(3): last_requested=9663,9664 last_responded=9667,9668
[ 150.550605] IPI(0): last_requested=21491,21492 last_responded=21493,21494
[ 150.552350] IPI(1): last_requested=10485,10486 last_responded=10489,10490
[ 150.554114] IPI(2): last_requested=11117,11118 last_responded=11111,11112
[ 150.555865] IPI(3): last_requested=9663,9664 last_responded=9667,9668
----------------------------------------
This means that llist_del_all() is NOT called after llist_add() is called on
some CPUs. This suggests that IPI requests are dropped (or cannot be processed)
for some reason.
Now, I got a question about __default_send_IPI_dest_field().
----------------------------------------
void __default_send_IPI_dest_field(unsigned int mask, int vector, unsigned int dest)
{
unsigned long cfg;
/*
* Wait for idle.
*/
if (unlikely(vector == NMI_VECTOR))
safe_apic_wait_icr_idle();
else
__xapic_wait_icr_idle();
/*
* prepare target chip field
*/
cfg = __prepare_ICR2(mask);
native_apic_mem_write(APIC_ICR2, cfg);
/*
* program the ICR
*/
cfg = __prepare_ICR(0, vector, dest);
/*
* Send the IPI. The write to APIC_ICR fires this off.
*/
native_apic_mem_write(APIC_ICR, cfg);
}
static inline void __xapic_wait_icr_idle(void)
{
while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
cpu_relax();
}
static inline u32 native_apic_mem_read(u32 reg)
{
return *((volatile u32 *)(APIC_BASE + reg));
}
static inline void native_apic_mem_write(u32 reg, u32 v)
{
volatile u32 *addr = (volatile u32 *)(APIC_BASE + reg);
alternative_io("movl %0, %P1", "xchgl %0, %P1", X86_BUG_11AP,
ASM_OUTPUT2("=r" (v), "=m" (*addr)),
ASM_OUTPUT2("0" (v), "m" (*addr)));
}
----------------------------------------
It seems to me that APIC_BASE APIC_ICR APIC_ICR_BUSY are all constant
regardless of calling cpu. Thus, native_apic_mem_read() and
native_apic_mem_write() are using globally shared constant memory
address and __xapic_wait_icr_idle() is making decision based on
globally shared constant memory address. Am I right?
Then, what happens if 2 CPUs called native_apic_mem_write(APIC_ICR),
one with vector == CALL_FUNCTION_VECTOR and the other with
vector != CALL_FUNCTION_VECTOR ? Since I can't find exclusion control
between CPUs here, native_apic_mem_write() for CALL_FUNCTION_VECTOR
can be ignored by concurrent !CALL_FUNCTION_VECTOR request?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: x86_64 Question: Are concurrent IPI requests safe?
2016-05-09 14:54 ` Tetsuo Handa
(?)
@ 2016-05-09 16:54 ` Thomas Gleixner
2016-05-11 13:19 ` Tetsuo Handa
-1 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2016-05-09 16:54 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: peterz, mingo, akpm, mgorman, mhocko, linux-kernel, linux-mm
On Mon, 9 May 2016, Tetsuo Handa wrote:
>
> It seems to me that APIC_BASE APIC_ICR APIC_ICR_BUSY are all constant
> regardless of calling cpu. Thus, native_apic_mem_read() and
> native_apic_mem_write() are using globally shared constant memory
> address and __xapic_wait_icr_idle() is making decision based on
> globally shared constant memory address. Am I right?
No. The APIC address space is per cpu. It's the same address but it's always
accessing the local APIC of the cpu on which it is called.
Thanks,
tglx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: x86_64 Question: Are concurrent IPI requests safe?
2016-05-09 16:54 ` Thomas Gleixner
@ 2016-05-11 13:19 ` Tetsuo Handa
0 siblings, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2016-05-11 13:19 UTC (permalink / raw)
To: tglx; +Cc: peterz, mingo, akpm, mgorman, mhocko, linux-kernel, linux-mm
Thomas Gleixner wrote:
> On Mon, 9 May 2016, Tetsuo Handa wrote:
> >
> > It seems to me that APIC_BASE APIC_ICR APIC_ICR_BUSY are all constant
> > regardless of calling cpu. Thus, native_apic_mem_read() and
> > native_apic_mem_write() are using globally shared constant memory
> > address and __xapic_wait_icr_idle() is making decision based on
> > globally shared constant memory address. Am I right?
>
> No. The APIC address space is per cpu. It's the same address but it's always
> accessing the local APIC of the cpu on which it is called.
Same address but per CPU magic. I see.
Now, I'm trying with CONFIG_TRACE_IRQFLAGS=y and I can observe that
irq event stamp shows that hardirqs are disabled for two CPUs when I hit
this bug. It seems to me that this bug is triggered when two CPUs are
concurrently calling smp_call_function_many() with wait == true.
(Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160511.txt.xz .)
----------------------------------------
[ 180.434620] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [tgid=13646:13646]
[ 180.434622] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [kswapd0:60]
[ 180.434646] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter coretemp pcspkr sg vmw_vmci i2c_piix4 ip_tables sd_mod ata_generic pata_acpi serio_raw mptspi scsi_transport_spi mptscsih vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mptbase ttm drm ahci e1000 libahci i2c_core ata_piix libata
[ 180.434646] irq event stamp: 5324978
[ 180.434649] hardirqs last enabled at (5324977): [<ffff88007860f990>] 0xffff88007860f990
[ 180.434650] hardirqs last disabled at (5324978): [<ffff88007860f990>] 0xffff88007860f990
[ 180.434655] softirqs last enabled at (5324976): [<ffffffff8107484a>] __do_softirq+0x21a/0x5a0
[ 180.434656] softirqs last disabled at (5324971): [<ffffffff81074ee5>] irq_exit+0x105/0x120
[ 180.434658] CPU: 3 PID: 60 Comm: kswapd0 Not tainted 4.6.0-rc7-next-20160511 #426
[ 180.434659] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 180.434659] task: ffff88007a046440 ti: ffff88007860c000 task.ti: ffff88007860c000
[ 180.434665] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
[ 180.434666] RSP: 0000:ffff88007860f950 EFLAGS: 00000202
[ 180.434667] RAX: 0000000000000000 RBX: ffff88007f8d8880 RCX: ffff88007f81d8f8
[ 180.434667] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88007f8d8888
[ 180.434667] RBP: ffff88007860f990 R08: 0000000000000005 R09: 0000000000000000
[ 180.434668] R10: 0000000000000001 R11: ffff88007a046cb0 R12: 0000000000000000
[ 180.434668] R13: 0000000000000008 R14: ffffffff8105b3e0 R15: ffff88007860f9a8
[ 180.434669] FS: 0000000000000000(0000) GS:ffff88007f8c0000(0000) knlGS:0000000000000000
[ 180.434682] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 180.434687] CR2: 00005558fa8a7643 CR3: 000000007aed0000 CR4: 00000000001406e0
[ 180.434697] Stack:
[ 180.434699] 0000000000018840 0100000000000001 ffff88007f8d8888 ffffffffffffffff
[ 180.434700] 0000000000000000 0000000000000000 ffff88007a047688 00000000ffffffff
[ 180.434701] ffff88007860f9f0 ffffffff8105bdcf ffffea0001ad3640 0000000000000000
[ 180.434702] Call Trace:
[ 180.434706] [<ffffffff8105bdcf>] native_flush_tlb_others+0x1cf/0x360
[ 180.434709] [<ffffffff811c87f4>] try_to_unmap_flush+0xb4/0x2a0
[ 180.434712] [<ffffffff8119adac>] shrink_page_list+0x4fc/0xb00
[ 180.434714] [<ffffffff8119bae2>] shrink_inactive_list+0x202/0x630
[ 180.434715] [<ffffffff8119c858>] shrink_zone_memcg+0x5a8/0x720
[ 180.434718] [<ffffffff810c9f00>] ? match_held_lock+0x180/0x1d0
[ 180.434719] [<ffffffff8119caa4>] shrink_zone+0xd4/0x2f0
[ 180.434720] [<ffffffff8119e486>] kswapd+0x4e6/0xb30
[ 180.434722] [<ffffffff8119dfa0>] ? mem_cgroup_shrink_node_zone+0x3e0/0x3e0
[ 180.434724] [<ffffffff81093a7e>] kthread+0xee/0x110
[ 180.434727] [<ffffffff8172ae3f>] ret_from_fork+0x1f/0x40
[ 180.434729] [<ffffffff81093990>] ? kthread_create_on_node+0x220/0x220
[ 180.434741] Code: d2 e8 26 33 2b 00 3b 05 a4 26 be 00 41 89 c4 0f 8d 6b fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 20 14 cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 83 f8 ff 74 ba 83 f8 07 76 b5 80 3d 70 76 bc
[ 180.532771] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter coretemp pcspkr sg vmw_vmci i2c_piix4 ip_tables sd_mod ata_generic pata_acpi serio_raw mptspi scsi_transport_spi mptscsih vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mptbase ttm drm ahci e1000 libahci i2c_core ata_piix libata
[ 180.547386] irq event stamp: 601148
[ 180.548951] hardirqs last enabled at (601147): [<ffff880078cffa00>] 0xffff880078cffa00
[ 180.551359] hardirqs last disabled at (601148): [<ffff880078cffa00>] 0xffff880078cffa00
[ 180.553738] softirqs last enabled at (601146): [<ffffffff8107484a>] __do_softirq+0x21a/0x5a0
[ 180.556190] softirqs last disabled at (601141): [<ffffffff81074ee5>] irq_exit+0x105/0x120
[ 180.558547] CPU: 2 PID: 13646 Comm: tgid=13646 Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 180.561046] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 180.563802] task: ffff880077ad1940 ti: ffff880078cfc000 task.ti: ffff880078cfc000
[ 180.565984] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
[ 180.568517] RSP: 0000:ffff880078cff9c0 EFLAGS: 00000202
[ 180.570338] RAX: 0000000000000000 RBX: ffff88007f898880 RCX: ffff88007f81d8d8
[ 180.572500] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88007f898888
[ 180.574628] RBP: ffff880078cffa00 R08: 0000000000000009 R09: 0000000000000000
[ 180.576818] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 180.578907] R13: 0000000000000008 R14: ffffffff81188420 R15: 0000000000000000
[ 180.581040] FS: 00007f497770a740(0000) GS:ffff88007f880000(0000) knlGS:0000000000000000
[ 180.583320] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 180.585188] CR2: 00007f49772ac8b5 CR3: 0000000078523000 CR4: 00000000001406e0
[ 180.587333] Stack:
[ 180.588536] 0000000000018840 0100000000000001 ffff88007f898888 ffffffff82b50168
[ 180.590831] 0000000000000002 ffffffff81188420 0000000000000000 0000000000000001
[ 180.593136] ffff880078cffa38 ffffffff8111079d 0000000000000008 0000000000000000
[ 180.595487] Call Trace:
[ 180.596782] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 180.598761] [<ffffffff8111079d>] on_each_cpu_mask+0x3d/0xc0
[ 180.600609] [<ffffffff81189234>] drain_all_pages+0xf4/0x130
[ 180.602480] [<ffffffff8118b88b>] __alloc_pages_nodemask+0x8ab/0xe80
[ 180.604450] [<ffffffff811e0c82>] alloc_pages_current+0x92/0x190
[ 180.606376] [<ffffffff8117eef6>] __page_cache_alloc+0x146/0x180
[ 180.608279] [<ffffffff81192061>] __do_page_cache_readahead+0x111/0x380
[ 180.610292] [<ffffffff811920ba>] ? __do_page_cache_readahead+0x16a/0x380
[ 180.612333] [<ffffffff811809f7>] ? pagecache_get_page+0x27/0x280
[ 180.614268] [<ffffffff81182bb3>] filemap_fault+0x2f3/0x680
[ 180.616104] [<ffffffff8130dccd>] ? xfs_filemap_fault+0x5d/0x1f0
[ 180.618008] [<ffffffff810c7c6d>] ? down_read_nested+0x2d/0x50
[ 180.619884] [<ffffffff8131e7c2>] ? xfs_ilock+0x1e2/0x2d0
[ 180.621691] [<ffffffff8130dcd8>] xfs_filemap_fault+0x68/0x1f0
[ 180.623567] [<ffffffff811b8f1b>] __do_fault+0x6b/0x110
[ 180.625352] [<ffffffff811be802>] handle_mm_fault+0x1252/0x1980
[ 180.627272] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 180.629175] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 180.631072] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 180.632852] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 180.634550] Code: d2 e8 26 33 2b 00 3b 05 a4 26 be 00 41 89 c4 0f 8d 6b fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 20 14 cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 83 f8 ff 74 ba 83 f8 07 76 b5 80 3d 70 76 bc
----------------------------------------
Since hardirqs last enabled / disabled points to the value of RBP register,
I can't identify the location where hardirqs are disabled.
But why cannot report the location hardirqs are last disabled?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: x86_64 Question: Are concurrent IPI requests safe?
@ 2016-05-11 13:19 ` Tetsuo Handa
0 siblings, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2016-05-11 13:19 UTC (permalink / raw)
To: tglx; +Cc: peterz, mingo, akpm, mgorman, mhocko, linux-kernel, linux-mm
Thomas Gleixner wrote:
> On Mon, 9 May 2016, Tetsuo Handa wrote:
> >
> > It seems to me that APIC_BASE APIC_ICR APIC_ICR_BUSY are all constant
> > regardless of calling cpu. Thus, native_apic_mem_read() and
> > native_apic_mem_write() are using globally shared constant memory
> > address and __xapic_wait_icr_idle() is making decision based on
> > globally shared constant memory address. Am I right?
>
> No. The APIC address space is per cpu. It's the same address but it's always
> accessing the local APIC of the cpu on which it is called.
Same address but per CPU magic. I see.
Now, I'm trying with CONFIG_TRACE_IRQFLAGS=y and I can observe that
irq event stamp shows that hardirqs are disabled for two CPUs when I hit
this bug. It seems to me that this bug is triggered when two CPUs are
concurrently calling smp_call_function_many() with wait == true.
(Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160511.txt.xz .)
----------------------------------------
[ 180.434620] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [tgid=13646:13646]
[ 180.434622] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [kswapd0:60]
[ 180.434646] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter coretemp pcspkr sg vmw_vmci i2c_piix4 ip_tables sd_mod ata_generic pata_acpi serio_raw mptspi scsi_transport_spi mptscsih vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mptbase ttm drm ahci e1000 libahci i2c_core ata_piix libata
[ 180.434646] irq event stamp: 5324978
[ 180.434649] hardirqs last enabled at (5324977): [<ffff88007860f990>] 0xffff88007860f990
[ 180.434650] hardirqs last disabled at (5324978): [<ffff88007860f990>] 0xffff88007860f990
[ 180.434655] softirqs last enabled at (5324976): [<ffffffff8107484a>] __do_softirq+0x21a/0x5a0
[ 180.434656] softirqs last disabled at (5324971): [<ffffffff81074ee5>] irq_exit+0x105/0x120
[ 180.434658] CPU: 3 PID: 60 Comm: kswapd0 Not tainted 4.6.0-rc7-next-20160511 #426
[ 180.434659] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 180.434659] task: ffff88007a046440 ti: ffff88007860c000 task.ti: ffff88007860c000
[ 180.434665] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
[ 180.434666] RSP: 0000:ffff88007860f950 EFLAGS: 00000202
[ 180.434667] RAX: 0000000000000000 RBX: ffff88007f8d8880 RCX: ffff88007f81d8f8
[ 180.434667] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88007f8d8888
[ 180.434667] RBP: ffff88007860f990 R08: 0000000000000005 R09: 0000000000000000
[ 180.434668] R10: 0000000000000001 R11: ffff88007a046cb0 R12: 0000000000000000
[ 180.434668] R13: 0000000000000008 R14: ffffffff8105b3e0 R15: ffff88007860f9a8
[ 180.434669] FS: 0000000000000000(0000) GS:ffff88007f8c0000(0000) knlGS:0000000000000000
[ 180.434682] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 180.434687] CR2: 00005558fa8a7643 CR3: 000000007aed0000 CR4: 00000000001406e0
[ 180.434697] Stack:
[ 180.434699] 0000000000018840 0100000000000001 ffff88007f8d8888 ffffffffffffffff
[ 180.434700] 0000000000000000 0000000000000000 ffff88007a047688 00000000ffffffff
[ 180.434701] ffff88007860f9f0 ffffffff8105bdcf ffffea0001ad3640 0000000000000000
[ 180.434702] Call Trace:
[ 180.434706] [<ffffffff8105bdcf>] native_flush_tlb_others+0x1cf/0x360
[ 180.434709] [<ffffffff811c87f4>] try_to_unmap_flush+0xb4/0x2a0
[ 180.434712] [<ffffffff8119adac>] shrink_page_list+0x4fc/0xb00
[ 180.434714] [<ffffffff8119bae2>] shrink_inactive_list+0x202/0x630
[ 180.434715] [<ffffffff8119c858>] shrink_zone_memcg+0x5a8/0x720
[ 180.434718] [<ffffffff810c9f00>] ? match_held_lock+0x180/0x1d0
[ 180.434719] [<ffffffff8119caa4>] shrink_zone+0xd4/0x2f0
[ 180.434720] [<ffffffff8119e486>] kswapd+0x4e6/0xb30
[ 180.434722] [<ffffffff8119dfa0>] ? mem_cgroup_shrink_node_zone+0x3e0/0x3e0
[ 180.434724] [<ffffffff81093a7e>] kthread+0xee/0x110
[ 180.434727] [<ffffffff8172ae3f>] ret_from_fork+0x1f/0x40
[ 180.434729] [<ffffffff81093990>] ? kthread_create_on_node+0x220/0x220
[ 180.434741] Code: d2 e8 26 33 2b 00 3b 05 a4 26 be 00 41 89 c4 0f 8d 6b fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 20 14 cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 83 f8 ff 74 ba 83 f8 07 76 b5 80 3d 70 76 bc
[ 180.532771] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_filter coretemp pcspkr sg vmw_vmci i2c_piix4 ip_tables sd_mod ata_generic pata_acpi serio_raw mptspi scsi_transport_spi mptscsih vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mptbase ttm drm ahci e1000 libahci i2c_core ata_piix libata
[ 180.547386] irq event stamp: 601148
[ 180.548951] hardirqs last enabled at (601147): [<ffff880078cffa00>] 0xffff880078cffa00
[ 180.551359] hardirqs last disabled at (601148): [<ffff880078cffa00>] 0xffff880078cffa00
[ 180.553738] softirqs last enabled at (601146): [<ffffffff8107484a>] __do_softirq+0x21a/0x5a0
[ 180.556190] softirqs last disabled at (601141): [<ffffffff81074ee5>] irq_exit+0x105/0x120
[ 180.558547] CPU: 2 PID: 13646 Comm: tgid=13646 Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 180.561046] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 180.563802] task: ffff880077ad1940 ti: ffff880078cfc000 task.ti: ffff880078cfc000
[ 180.565984] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
[ 180.568517] RSP: 0000:ffff880078cff9c0 EFLAGS: 00000202
[ 180.570338] RAX: 0000000000000000 RBX: ffff88007f898880 RCX: ffff88007f81d8d8
[ 180.572500] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88007f898888
[ 180.574628] RBP: ffff880078cffa00 R08: 0000000000000009 R09: 0000000000000000
[ 180.576818] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 180.578907] R13: 0000000000000008 R14: ffffffff81188420 R15: 0000000000000000
[ 180.581040] FS: 00007f497770a740(0000) GS:ffff88007f880000(0000) knlGS:0000000000000000
[ 180.583320] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 180.585188] CR2: 00007f49772ac8b5 CR3: 0000000078523000 CR4: 00000000001406e0
[ 180.587333] Stack:
[ 180.588536] 0000000000018840 0100000000000001 ffff88007f898888 ffffffff82b50168
[ 180.590831] 0000000000000002 ffffffff81188420 0000000000000000 0000000000000001
[ 180.593136] ffff880078cffa38 ffffffff8111079d 0000000000000008 0000000000000000
[ 180.595487] Call Trace:
[ 180.596782] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 180.598761] [<ffffffff8111079d>] on_each_cpu_mask+0x3d/0xc0
[ 180.600609] [<ffffffff81189234>] drain_all_pages+0xf4/0x130
[ 180.602480] [<ffffffff8118b88b>] __alloc_pages_nodemask+0x8ab/0xe80
[ 180.604450] [<ffffffff811e0c82>] alloc_pages_current+0x92/0x190
[ 180.606376] [<ffffffff8117eef6>] __page_cache_alloc+0x146/0x180
[ 180.608279] [<ffffffff81192061>] __do_page_cache_readahead+0x111/0x380
[ 180.610292] [<ffffffff811920ba>] ? __do_page_cache_readahead+0x16a/0x380
[ 180.612333] [<ffffffff811809f7>] ? pagecache_get_page+0x27/0x280
[ 180.614268] [<ffffffff81182bb3>] filemap_fault+0x2f3/0x680
[ 180.616104] [<ffffffff8130dccd>] ? xfs_filemap_fault+0x5d/0x1f0
[ 180.618008] [<ffffffff810c7c6d>] ? down_read_nested+0x2d/0x50
[ 180.619884] [<ffffffff8131e7c2>] ? xfs_ilock+0x1e2/0x2d0
[ 180.621691] [<ffffffff8130dcd8>] xfs_filemap_fault+0x68/0x1f0
[ 180.623567] [<ffffffff811b8f1b>] __do_fault+0x6b/0x110
[ 180.625352] [<ffffffff811be802>] handle_mm_fault+0x1252/0x1980
[ 180.627272] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 180.629175] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 180.631072] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 180.632852] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 180.634550] Code: d2 e8 26 33 2b 00 3b 05 a4 26 be 00 41 89 c4 0f 8d 6b fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 20 14 cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 83 f8 ff 74 ba 83 f8 07 76 b5 80 3d 70 76 bc
----------------------------------------
Since hardirqs last enabled / disabled points to the value of RBP register,
I can't identify the location where hardirqs are disabled.
But why cannot report the location hardirqs are last disabled?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: x86_64 Question: Are concurrent IPI requests safe?
2016-05-11 13:19 ` Tetsuo Handa
@ 2016-05-11 13:39 ` Peter Zijlstra
-1 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2016-05-11 13:39 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: tglx, mingo, akpm, mgorman, mhocko, linux-kernel, linux-mm
On Wed, May 11, 2016 at 10:19:16PM +0900, Tetsuo Handa wrote:
> [ 180.434659] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
can you reproduce on real hardware?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: x86_64 Question: Are concurrent IPI requests safe?
@ 2016-05-11 13:39 ` Peter Zijlstra
0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2016-05-11 13:39 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: tglx, mingo, akpm, mgorman, mhocko, linux-kernel, linux-mm
On Wed, May 11, 2016 at 10:19:16PM +0900, Tetsuo Handa wrote:
> [ 180.434659] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
can you reproduce on real hardware?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: x86_64 Question: Are concurrent IPI requests safe?
2016-05-11 13:39 ` Peter Zijlstra
@ 2016-05-11 14:09 ` Tetsuo Handa
-1 siblings, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2016-05-11 14:09 UTC (permalink / raw)
To: peterz; +Cc: tglx, mingo, akpm, mgorman, mhocko, linux-kernel, linux-mm
Peter Zijlstra wrote:
> On Wed, May 11, 2016 at 10:19:16PM +0900, Tetsuo Handa wrote:
> > [ 180.434659] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
>
> can you reproduce on real hardware?
>
Unfortunately, I don't have a real hardware to run development kernels.
My Linux environment is limited to 4 CPUs / 1024MB or 2048MB RAM running
as a VMware guest on Windows. Can somebody try KVM environment with
4 CPUs / 1024MB or 2048MB RAM whith partition only plain /dev/sda1
formatted as XFS?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: x86_64 Question: Are concurrent IPI requests safe?
@ 2016-05-11 14:09 ` Tetsuo Handa
0 siblings, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2016-05-11 14:09 UTC (permalink / raw)
To: peterz; +Cc: tglx, mingo, akpm, mgorman, mhocko, linux-kernel, linux-mm
Peter Zijlstra wrote:
> On Wed, May 11, 2016 at 10:19:16PM +0900, Tetsuo Handa wrote:
> > [ 180.434659] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
>
> can you reproduce on real hardware?
>
Unfortunately, I don't have a real hardware to run development kernels.
My Linux environment is limited to 4 CPUs / 1024MB or 2048MB RAM running
as a VMware guest on Windows. Can somebody try KVM environment with
4 CPUs / 1024MB or 2048MB RAM whith partition only plain /dev/sda1
formatted as XFS?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: x86_64 Question: Are concurrent IPI requests safe?
2016-05-11 13:19 ` Tetsuo Handa
(?)
(?)
@ 2016-05-11 14:13 ` Thomas Gleixner
-1 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2016-05-11 14:13 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: peterz, mingo, akpm, mgorman, mhocko, linux-kernel, linux-mm
On Wed, 11 May 2016, Tetsuo Handa wrote:
> Thomas Gleixner wrote:
> > On Mon, 9 May 2016, Tetsuo Handa wrote:
> > >
> > > It seems to me that APIC_BASE APIC_ICR APIC_ICR_BUSY are all constant
> > > regardless of calling cpu. Thus, native_apic_mem_read() and
> > > native_apic_mem_write() are using globally shared constant memory
> > > address and __xapic_wait_icr_idle() is making decision based on
> > > globally shared constant memory address. Am I right?
> >
> > No. The APIC address space is per cpu. It's the same address but it's always
> > accessing the local APIC of the cpu on which it is called.
>
> Same address but per CPU magic. I see.
>
> Now, I'm trying with CONFIG_TRACE_IRQFLAGS=y and I can observe that
> irq event stamp shows that hardirqs are disabled for two CPUs when I hit
> this bug. It seems to me that this bug is triggered when two CPUs are
> concurrently calling smp_call_function_many() with wait == true.
> [ 180.434649] hardirqs last enabled at (5324977): [<ffff88007860f990>] 0xffff88007860f990
> [ 180.434650] hardirqs last disabled at (5324978): [<ffff88007860f990>] 0xffff88007860f990
Those addresses are on the stack !?! That makes no sense whatsoever.
> [ 180.434659] task: ffff88007a046440 ti: ffff88007860c000 task.ti: ffff88007860c000
> [ 180.434665] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
> [ 180.434666] RSP: 0000:ffff88007860f950 EFLAGS: 00000202
And on this CPU interrupt are enabled because the IF bit (9) in EFLAGS is set.
> [ 180.548951] hardirqs last enabled at (601147): [<ffff880078cffa00>] 0xffff880078cffa00
> [ 180.551359] hardirqs last disabled at (601148): [<ffff880078cffa00>] 0xffff880078cffa00
Equally crap.
> [ 180.563802] task: ffff880077ad1940 ti: ffff880078cfc000 task.ti: ffff880078cfc000
> [ 180.565984] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
> [ 180.568517] RSP: 0000:ffff880078cff9c0 EFLAGS: 00000202
And again interrupts are enabled.
Thanks,
tglx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: x86_64 Question: Are concurrent IPI requests safe?
2016-05-11 14:09 ` Tetsuo Handa
(?)
@ 2016-05-11 14:33 ` Thomas Gleixner
2016-05-11 15:19 ` Tetsuo Handa
-1 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2016-05-11 14:33 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: peterz, mingo, akpm, mgorman, mhocko, linux-kernel, linux-mm
On Wed, 11 May 2016, Tetsuo Handa wrote:
> Peter Zijlstra wrote:
> > On Wed, May 11, 2016 at 10:19:16PM +0900, Tetsuo Handa wrote:
> > > [ 180.434659] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
> >
> > can you reproduce on real hardware?
> >
> Unfortunately, I don't have a real hardware to run development kernels.
>
> My Linux environment is limited to 4 CPUs / 1024MB or 2048MB RAM running
> as a VMware guest on Windows. Can somebody try KVM environment with
> 4 CPUs / 1024MB or 2048MB RAM whith partition only plain /dev/sda1
> formatted as XFS?
Can you trigger a back trace on all cpus when the watchdog triggers?
Thanks,
tglx
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: x86_64 Question: Are concurrent IPI requests safe?
2016-05-11 14:33 ` Thomas Gleixner
@ 2016-05-11 15:19 ` Tetsuo Handa
0 siblings, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2016-05-11 15:19 UTC (permalink / raw)
To: tglx; +Cc: peterz, mingo, akpm, mgorman, mhocko, linux-kernel, linux-mm
Thomas Gleixner wrote:
> On Wed, 11 May 2016, Tetsuo Handa wrote:
> > Peter Zijlstra wrote:
> > > On Wed, May 11, 2016 at 10:19:16PM +0900, Tetsuo Handa wrote:
> > > > [ 180.434659] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
> > >
> > > can you reproduce on real hardware?
> > >
> > Unfortunately, I don't have a real hardware to run development kernels.
> >
> > My Linux environment is limited to 4 CPUs / 1024MB or 2048MB RAM running
> > as a VMware guest on Windows. Can somebody try KVM environment with
> > 4 CPUs / 1024MB or 2048MB RAM whith partition only plain /dev/sda1
> > formatted as XFS?
>
> Can you trigger a back trace on all cpus when the watchdog triggers?
>
"a back trace on all cpus" means SysRq-l ? Then, here it is.
We can see a CPU with EFLAGS: 00000082 or EFLAGS: 00000097 which is at
__rwsem_do_wake().
Well, I came to feel that this is caused by down_write_killable() bug.
I guess we should fix down_write_killable() bug first.
( Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160511-2.txt.xz . )
----------------------------------------
[ 164.412391] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [tgid=5368:5372]
[ 164.412397] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [nmbd:3765]
(...snipped...)
[ 174.740085] sysrq: SysRq : Show backtrace of all active CPUs
[ 174.741969] Sending NMI to all CPUs:
[ 174.743474] NMI backtrace for cpu 0
[ 174.744834] CPU: 0 PID: 5313 Comm: tgid=5309 Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 174.747223] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 174.749993] task: ffff880073e91940 ti: ffff880079b88000 task.ti: ffff880079b88000
[ 174.752321] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
[ 174.754804] RSP: 0000:ffff880079b8bbb8 EFLAGS: 00000202
[ 174.756528] RAX: 0000000000000002 RBX: ffff88007f818880 RCX: ffff88007f89c238
[ 174.758673] RDX: 0000000000000002 RSI: 0000000000000008 RDI: ffff88007f818888
[ 174.761174] RBP: ffff880079b8bbf8 R08: 000000000000000c R09: 0000000000000000
[ 174.763583] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
[ 174.765616] R13: 0000000000000008 R14: ffffffff81188420 R15: 0000000000000000
[ 174.767622] FS: 00007fd2d1c82740(0000) GS:ffff88007f800000(0000) knlGS:0000000000000000
[ 174.769922] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 174.771744] CR2: 00007f1069816541 CR3: 00000000657ec000 CR4: 00000000001406f0
[ 174.774472] Stack:
[ 174.775576] 0000000000018840 0100000000000001 ffff88007f818888 ffffffff82b50168
[ 174.777672] 0000000000000000 ffffffff81188420 0000000000000000 0000000000000001
[ 174.779768] ffff880079b8bc30 ffffffff8111079d 0000000000000008 0000000000000000
[ 174.781996] Call Trace:
[ 174.783490] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 174.785348] [<ffffffff8111079d>] on_each_cpu_mask+0x3d/0xc0
[ 174.787503] [<ffffffff81189234>] drain_all_pages+0xf4/0x130
[ 174.789379] [<ffffffff8118b88b>] __alloc_pages_nodemask+0x8ab/0xe80
[ 174.791292] [<ffffffff811e3259>] alloc_pages_vma+0xb9/0x2b0
[ 174.793399] [<ffffffff811be9e0>] handle_mm_fault+0x1430/0x1980
[ 174.795189] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 174.797012] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 174.798779] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 174.800461] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 174.802102] Code: d2 e8 26 33 2b 00 3b 05 a4 26 be 00 41 89 c4 0f 8d 6b fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 20 14 cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 83 f8 ff 74 ba 83 f8 07 76 b5 80 3d 70 76 bc
[ 174.807003] NMI backtrace for cpu 3
[ 174.808419] CPU: 3 PID: 3765 Comm: nmbd Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 174.810860] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 174.813685] task: ffff880079c81940 ti: ffff880078d38000 task.ti: ffff880078d38000
[ 174.815857] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
[ 174.818324] RSP: 0000:ffff880078d3ba88 EFLAGS: 00000202
[ 174.820163] RAX: 0000000000000002 RBX: ffff88007f8d8880 RCX: ffff88007f89d8f8
[ 174.822353] RDX: 0000000000000002 RSI: 0000000000000008 RDI: ffff88007f8d8888
[ 174.824503] RBP: ffff880078d3bac8 R08: 0000000000000005 R09: 0000000000000000
[ 174.826634] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
[ 174.828762] R13: 0000000000000008 R14: ffffffff81188420 R15: 0000000000000000
[ 174.830885] FS: 00007f72564d5840(0000) GS:ffff88007f8c0000(0000) knlGS:0000000000000000
[ 174.833195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 174.835061] CR2: 00007f0d7c0d7110 CR3: 000000007925d000 CR4: 00000000001406e0
[ 174.837160] Stack:
[ 174.838307] 0000000000018840 0100000000000001 ffff88007f8d8888 ffffffff82b50168
[ 174.840472] 0000000000000003 ffffffff81188420 0000000000000000 0000000000000001
[ 174.842625] ffff880078d3bb00 ffffffff8111079d 0000000000000008 0000000000000000
[ 174.844762] Call Trace:
[ 174.845987] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 174.847900] [<ffffffff8111079d>] on_each_cpu_mask+0x3d/0xc0
[ 174.849686] [<ffffffff81189234>] drain_all_pages+0xf4/0x130
[ 174.851483] [<ffffffff8118b88b>] __alloc_pages_nodemask+0x8ab/0xe80
[ 174.853646] [<ffffffff811e0c82>] alloc_pages_current+0x92/0x190
[ 174.855497] [<ffffffff8117eef6>] __page_cache_alloc+0x146/0x180
[ 174.857860] [<ffffffff811809f7>] ? pagecache_get_page+0x27/0x280
[ 174.859715] [<ffffffff81182d02>] filemap_fault+0x442/0x680
[ 174.861498] [<ffffffff81182bc8>] ? filemap_fault+0x308/0x680
[ 174.863624] [<ffffffff8130dcd8>] xfs_filemap_fault+0x68/0x1f0
[ 174.865439] [<ffffffff811b8f1b>] __do_fault+0x6b/0x110
[ 174.867120] [<ffffffff811be802>] handle_mm_fault+0x1252/0x1980
[ 174.868947] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 174.870763] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 174.872540] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 174.874684] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 174.876354] Code: d2 e8 26 33 2b 00 3b 05 a4 26 be 00 41 89 c4 0f 8d 6b fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 20 14 cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 83 f8 ff 74 ba 83 f8 07 76 b5 80 3d 70 76 bc
[ 174.881275] NMI backtrace for cpu 1
[ 174.882668] CPU: 1 PID: 5372 Comm: tgid=5368 Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 174.885226] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 174.888341] task: ffff8800038c6440 ti: ffff880003af8000 task.ti: ffff880003af8000
[ 174.890688] RIP: 0010:[<ffffffff8104594b>] [<ffffffff8104594b>] __default_send_IPI_dest_field+0x3b/0x60
[ 174.893307] RSP: 0000:ffff88007f843b90 EFLAGS: 00000046
[ 174.895106] RAX: 0000000000000400 RBX: 0000000000000001 RCX: 0000000000000001
[ 174.897232] RDX: 0000000000000400 RSI: 0000000000000002 RDI: 0000000002000000
[ 174.899502] RBP: ffff88007f843ba0 R08: 000000000000000f R09: 0000000000000000
[ 174.901658] R10: 0000000000000001 R11: 0000000000001d1e R12: 000000000000a12e
[ 174.904146] R13: ffffffff81cf59a0 R14: 0000000000000002 R15: 0000000000000082
[ 174.906259] FS: 00007fd2d1c82740(0000) GS:ffff88007f840000(0000) knlGS:0000000000000000
[ 174.908544] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 174.910382] CR2: 00007fd2d176c480 CR3: 0000000005f64000 CR4: 00000000001406e0
[ 174.912446] Stack:
[ 174.913860] 000000007f843ba0 0000000200000002 ffff88007f843be0 ffffffff81045a37
[ 174.915947] ffff88007f843bc0 0000000000000001 0000000000000001 ffffffff81046b10
[ 174.918031] 0000000000000007 ffff88007ace3100 ffff88007f843bf0 ffffffff81046b26
[ 174.920220] Call Trace:
[ 174.921448] <IRQ> \x01d [<ffffffff81045a37>] default_send_IPI_mask_sequence_phys+0x67/0xe0
[ 174.923677] [<ffffffff81046b10>] ? irq_force_complete_move+0x100/0x100
[ 174.926005] [<ffffffff81046b26>] nmi_raise_cpu_backtrace+0x16/0x20
[ 174.928079] [<ffffffff813b3104>] nmi_trigger_all_cpu_backtrace+0x64/0xfc
[ 174.930088] [<ffffffff81046b84>] arch_trigger_all_cpu_backtrace+0x14/0x20
[ 174.932193] [<ffffffff8146a5ee>] sysrq_handle_showallcpus+0xe/0x10
[ 174.934125] [<ffffffff8146ad16>] __handle_sysrq+0x136/0x220
[ 174.935914] [<ffffffff8146abe0>] ? __sysrq_get_key_op+0x30/0x30
[ 174.937753] [<ffffffff8146b19e>] sysrq_filter+0x36e/0x3b0
[ 174.939497] [<ffffffff81566172>] input_to_handler+0x52/0x100
[ 174.941283] [<ffffffff81568def>] input_pass_values.part.5+0x1ff/0x280
[ 174.943228] [<ffffffff81568bf0>] ? input_register_handler+0xc0/0xc0
[ 174.945131] [<ffffffff815695a4>] input_handle_event+0x114/0x4e0
[ 174.946947] [<ffffffff815699bf>] input_event+0x4f/0x70
[ 174.948638] [<ffffffff8157113d>] atkbd_interrupt+0x5bd/0x6a0
[ 174.950409] [<ffffffff81563531>] serio_interrupt+0x41/0x80
[ 174.952183] [<ffffffff8156412a>] i8042_interrupt+0x1da/0x3a0
[ 174.954407] [<ffffffff810e0949>] handle_irq_event_percpu+0x69/0x470
[ 174.956320] [<ffffffff810e0d84>] handle_irq_event+0x34/0x60
[ 174.958205] [<ffffffff810e41d3>] handle_edge_irq+0xa3/0x160
[ 174.960187] [<ffffffff8101f647>] handle_irq+0xf7/0x170
[ 174.961870] [<ffffffff8101eaca>] do_IRQ+0x6a/0x130
[ 174.963462] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 174.969241] [<ffffffff8172b593>] common_interrupt+0x93/0x93
[ 174.971035] <EOI> \x01d [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 174.973042] [<ffffffff811105bf>] ? smp_call_function_many+0x21f/0x2c0
[ 174.975222] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 174.977089] [<ffffffff8111079d>] on_each_cpu_mask+0x3d/0xc0
[ 174.979940] [<ffffffff81189234>] drain_all_pages+0xf4/0x130
[ 174.981739] [<ffffffff8118b88b>] __alloc_pages_nodemask+0x8ab/0xe80
[ 174.983945] [<ffffffff811e0c82>] alloc_pages_current+0x92/0x190
[ 174.985740] [<ffffffff8117eef6>] __page_cache_alloc+0x146/0x180
[ 174.987576] [<ffffffff81192061>] __do_page_cache_readahead+0x111/0x380
[ 174.989462] [<ffffffff811920ba>] ? __do_page_cache_readahead+0x16a/0x380
[ 174.991327] [<ffffffff811809f7>] ? pagecache_get_page+0x27/0x280
[ 174.993432] [<ffffffff81182bb3>] filemap_fault+0x2f3/0x680
[ 174.995088] [<ffffffff8130dccd>] ? xfs_filemap_fault+0x5d/0x1f0
[ 174.996808] [<ffffffff810c7c6d>] ? down_read_nested+0x2d/0x50
[ 174.998586] [<ffffffff8131e7c2>] ? xfs_ilock+0x1e2/0x2d0
[ 175.000279] [<ffffffff8130dcd8>] xfs_filemap_fault+0x68/0x1f0
[ 175.002649] [<ffffffff811b8f1b>] __do_fault+0x6b/0x110
[ 175.004203] [<ffffffff811be802>] handle_mm_fault+0x1252/0x1980
[ 175.005888] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 175.007575] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 175.009203] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 175.010840] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 175.012346] Code: 90 8b 04 25 00 d3 5f ff f6 c4 10 75 f2 c1 e7 18 89 3c 25 10 d3 5f ff 89 f0 09 d0 80 ce 04 83 fe 02 0f 44 c2 89 04 25 00 d3 5f ff <c9> c3 48 8b 05 74 be ca 00 89 55 f4 89 75 f8 89 7d fc ff 90 18
[ 175.017059] NMI backtrace for cpu 2
[ 175.018546] CPU: 2 PID: 5207 Comm: tgid=5207 Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 175.021220] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 175.025744] task: ffff880007860040 ti: ffff88005d7cc000 task.ti: ffff88005d7cc000
[ 175.027749] RIP: 0010:[<ffffffff810d09db>] [<ffffffff810d09db>] __rwsem_do_wake+0x5b/0x150
[ 175.029917] RSP: 0000:ffff88005d7cfc10 EFLAGS: 00000082
[ 175.031562] RAX: ffff8800795dbbc0 RBX: ffff880064645be0 RCX: ffffffff00000000
[ 175.033558] RDX: fffffffe00000000 RSI: 0000000000000001 RDI: ffffffffffffffff
[ 175.035618] RBP: ffff88005d7cfc38 R08: 0000000000000001 R09: 0000000000000000
[ 175.037932] R10: ffff880007860040 R11: ffff8800078608d8 R12: ffff880064645be0
[ 175.039920] R13: 0000000000000282 R14: ffff8800738ed068 R15: ffffea00016cb680
[ 175.041979] FS: 00007fd2d1c82740(0000) GS:ffff88007f880000(0000) knlGS:0000000000000000
[ 175.044474] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 175.046204] CR2: 00007f7252d7b540 CR3: 000000007a495000 CR4: 00000000001406e0
[ 175.048219] Stack:
[ 175.049286] ffff880064645be0 ffff880064645bf8 0000000000000282 ffff8800738ed068
[ 175.051392] ffffea00016cb680 ffff88005d7cfc68 ffffffff810d0b18 ffff880064645be0
[ 175.053477] ffff880078d6db40 ffff88005d7cfdb0 ffff8800738ed068 ffff88005d7cfcb0
[ 175.055587] Call Trace:
[ 175.056745] [<ffffffff810d0b18>] rwsem_wake+0x48/0xb0
[ 175.058402] [<ffffffff813bc99b>] call_rwsem_wake+0x1b/0x30
[ 175.060146] [<ffffffff810c7d80>] up_read+0x30/0x40
[ 175.061774] [<ffffffff8118283c>] __lock_page_or_retry+0x6c/0xf0
[ 175.063597] [<ffffffff81182a5c>] filemap_fault+0x19c/0x680
[ 175.065324] [<ffffffff81182bc8>] ? filemap_fault+0x308/0x680
[ 175.067080] [<ffffffff8130dcd8>] xfs_filemap_fault+0x68/0x1f0
[ 175.069002] [<ffffffff811b8f1b>] __do_fault+0x6b/0x110
[ 175.070705] [<ffffffff811be802>] handle_mm_fault+0x1252/0x1980
[ 175.072513] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 175.074323] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 175.076093] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 175.077784] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 175.079423] Code: 5f 5d c3 83 fe 02 0f 84 e5 00 00 00 be 01 00 00 00 48 b9 00 00 00 00 ff ff ff ff 48 c7 c7 ff ff ff ff 48 89 f2 f0 49 0f c1 14 24 <48> 39 ca 0f 8c c7 00 00 00 49 8b 54 24 08 be 01 00 00 00 4d 8d
(...snipped...)
[ 209.471203] sysrq: SysRq : Show backtrace of all active CPUs
[ 209.473421] Sending NMI to all CPUs:
[ 209.475059] NMI backtrace for cpu 0
[ 209.476875] CPU: 0 PID: 5313 Comm: tgid=5309 Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 209.479592] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 209.482566] task: ffff880073e91940 ti: ffff880079b88000 task.ti: ffff880079b88000
[ 209.484837] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
[ 209.487465] RSP: 0000:ffff880079b8bbb8 EFLAGS: 00000202
[ 209.489353] RAX: 0000000000000002 RBX: ffff88007f818880 RCX: ffff88007f89c238
[ 209.491539] RDX: 0000000000000002 RSI: 0000000000000008 RDI: ffff88007f818888
[ 209.493706] RBP: ffff880079b8bbf8 R08: 000000000000000c R09: 0000000000000000
[ 209.495889] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
[ 209.498366] R13: 0000000000000008 R14: ffffffff81188420 R15: 0000000000000000
[ 209.500459] FS: 00007fd2d1c82740(0000) GS:ffff88007f800000(0000) knlGS:0000000000000000
[ 209.502701] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 209.504539] CR2: 00007f1069816541 CR3: 00000000657ec000 CR4: 00000000001406f0
[ 209.508705] Stack:
[ 209.510195] 0000000000018840 0100000000000001 ffff88007f818888 ffffffff82b50168
[ 209.512492] 0000000000000000 ffffffff81188420 0000000000000000 0000000000000001
[ 209.514658] ffff880079b8bc30 ffffffff8111079d 0000000000000008 0000000000000000
[ 209.516853] Call Trace:
[ 209.518138] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 209.520060] [<ffffffff8111079d>] on_each_cpu_mask+0x3d/0xc0
[ 209.521855] [<ffffffff81189234>] drain_all_pages+0xf4/0x130
[ 209.523842] [<ffffffff8118b88b>] __alloc_pages_nodemask+0x8ab/0xe80
[ 209.525752] [<ffffffff811e3259>] alloc_pages_vma+0xb9/0x2b0
[ 209.527587] [<ffffffff811be9e0>] handle_mm_fault+0x1430/0x1980
[ 209.529484] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 209.531536] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 209.533310] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 209.535023] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 209.536666] Code: d2 e8 26 33 2b 00 3b 05 a4 26 be 00 41 89 c4 0f 8d 6b fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 20 14 cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 83 f8 ff 74 ba 83 f8 07 76 b5 80 3d 70 76 bc
[ 209.542321] NMI backtrace for cpu 3
[ 209.543740] CPU: 3 PID: 3765 Comm: nmbd Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 209.546090] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 209.549111] task: ffff880079c81940 ti: ffff880078d38000 task.ti: ffff880078d38000
[ 209.551336] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
[ 209.553848] RSP: 0000:ffff880078d3ba88 EFLAGS: 00000202
[ 209.555668] RAX: 0000000000000002 RBX: ffff88007f8d8880 RCX: ffff88007f89d8f8
[ 209.558128] RDX: 0000000000000002 RSI: 0000000000000008 RDI: ffff88007f8d8888
[ 209.561152] RBP: ffff880078d3bac8 R08: 0000000000000005 R09: 0000000000000000
[ 209.563334] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
[ 209.565526] R13: 0000000000000008 R14: ffffffff81188420 R15: 0000000000000000
[ 209.567984] FS: 00007f72564d5840(0000) GS:ffff88007f8c0000(0000) knlGS:0000000000000000
[ 209.570302] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 209.573204] CR2: 00007f0d7c0d7110 CR3: 000000007925d000 CR4: 00000000001406e0
[ 209.575365] Stack:
[ 209.576556] 0000000000018840 0100000000000001 ffff88007f8d8888 ffffffff82b50168
[ 209.578709] 0000000000000003 ffffffff81188420 0000000000000000 0000000000000001
[ 209.580948] ffff880078d3bb00 ffffffff8111079d 0000000000000008 0000000000000000
[ 209.583196] Call Trace:
[ 209.584428] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 209.586703] [<ffffffff8111079d>] on_each_cpu_mask+0x3d/0xc0
[ 209.588513] [<ffffffff81189234>] drain_all_pages+0xf4/0x130
[ 209.590445] [<ffffffff8118b88b>] __alloc_pages_nodemask+0x8ab/0xe80
[ 209.592459] [<ffffffff811e0c82>] alloc_pages_current+0x92/0x190
[ 209.595874] [<ffffffff8117eef6>] __page_cache_alloc+0x146/0x180
[ 209.597738] [<ffffffff811809f7>] ? pagecache_get_page+0x27/0x280
[ 209.599601] [<ffffffff81182d02>] filemap_fault+0x442/0x680
[ 209.601363] [<ffffffff81182bc8>] ? filemap_fault+0x308/0x680
[ 209.603141] [<ffffffff8130dcd8>] xfs_filemap_fault+0x68/0x1f0
[ 209.604944] [<ffffffff811b8f1b>] __do_fault+0x6b/0x110
[ 209.607422] [<ffffffff811be802>] handle_mm_fault+0x1252/0x1980
[ 209.609281] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 209.611068] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 209.612826] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 209.614518] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 209.616155] Code: d2 e8 26 33 2b 00 3b 05 a4 26 be 00 41 89 c4 0f 8d 6b fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 20 14 cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 83 f8 ff 74 ba 83 f8 07 76 b5 80 3d 70 76 bc
[ 209.621077] NMI backtrace for cpu 1
[ 209.622473] CPU: 1 PID: 5372 Comm: tgid=5368 Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 209.624907] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 209.628064] task: ffff8800038c6440 ti: ffff880003af8000 task.ti: ffff880003af8000
[ 209.630234] RIP: 0010:[<ffffffff8104594b>] [<ffffffff8104594b>] __default_send_IPI_dest_field+0x3b/0x60
[ 209.632774] RSP: 0000:ffff88007f843b90 EFLAGS: 00000046
[ 209.634554] RAX: 0000000000000400 RBX: 0000000000000001 RCX: 0000000000000001
[ 209.636682] RDX: 0000000000000400 RSI: 0000000000000002 RDI: 0000000002000000
[ 209.639148] RBP: ffff88007f843ba0 R08: 000000000000000f R09: 0000000000000000
[ 209.641495] R10: 0000000000000001 R11: 0000000000001e88 R12: 000000000000a12e
[ 209.643584] R13: ffffffff81cf59a0 R14: 0000000000000002 R15: 0000000000000082
[ 209.645672] FS: 00007fd2d1c82740(0000) GS:ffff88007f840000(0000) knlGS:0000000000000000
[ 209.648320] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 209.650338] CR2: 00007fd2d176c480 CR3: 0000000005f64000 CR4: 00000000001406e0
[ 209.656131] Stack:
[ 209.658919] 000000007f843ba0 0000000200000002 ffff88007f843be0 ffffffff81045a37
[ 209.665787] ffff88007f843bc0 0000000000000001 0000000000000001 ffffffff81046b10
[ 209.671430] 0000000000000007 ffff88007ace3100 ffff88007f843bf0 ffffffff81046b26
[ 209.675269] Call Trace:
[ 209.676537] <IRQ> \x01d [<ffffffff81045a37>] default_send_IPI_mask_sequence_phys+0x67/0xe0
[ 209.678825] [<ffffffff81046b10>] ? irq_force_complete_move+0x100/0x100
[ 209.680819] [<ffffffff81046b26>] nmi_raise_cpu_backtrace+0x16/0x20
[ 209.682741] [<ffffffff813b3104>] nmi_trigger_all_cpu_backtrace+0x64/0xfc
[ 209.684727] [<ffffffff81046b84>] arch_trigger_all_cpu_backtrace+0x14/0x20
[ 209.686744] [<ffffffff8146a5ee>] sysrq_handle_showallcpus+0xe/0x10
[ 209.689182] [<ffffffff8146ad16>] __handle_sysrq+0x136/0x220
[ 209.691013] [<ffffffff8146abe0>] ? __sysrq_get_key_op+0x30/0x30
[ 209.692861] [<ffffffff8146b19e>] sysrq_filter+0x36e/0x3b0
[ 209.694591] [<ffffffff81566172>] input_to_handler+0x52/0x100
[ 209.696354] [<ffffffff81568def>] input_pass_values.part.5+0x1ff/0x280
[ 209.698259] [<ffffffff81568bf0>] ? input_register_handler+0xc0/0xc0
[ 209.700143] [<ffffffff815695a4>] input_handle_event+0x114/0x4e0
[ 209.701948] [<ffffffff815699bf>] input_event+0x4f/0x70
[ 209.703670] [<ffffffff8157113d>] atkbd_interrupt+0x5bd/0x6a0
[ 209.705451] [<ffffffff81563531>] serio_interrupt+0x41/0x80
[ 209.707256] [<ffffffff8156412a>] i8042_interrupt+0x1da/0x3a0
[ 209.709449] [<ffffffff810e0949>] handle_irq_event_percpu+0x69/0x470
[ 209.711598] [<ffffffff810e0d84>] handle_irq_event+0x34/0x60
[ 209.713558] [<ffffffff810e41d3>] handle_edge_irq+0xa3/0x160
[ 209.715308] [<ffffffff8101f647>] handle_irq+0xf7/0x170
[ 209.716966] [<ffffffff8101eaca>] do_IRQ+0x6a/0x130
[ 209.718558] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 209.720407] [<ffffffff8172b593>] common_interrupt+0x93/0x93
[ 209.722147] <EOI> \x01d [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 209.724180] [<ffffffff811105bf>] ? smp_call_function_many+0x21f/0x2c0
[ 209.726093] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 209.728009] [<ffffffff8111079d>] on_each_cpu_mask+0x3d/0xc0
[ 209.729760] [<ffffffff81189234>] drain_all_pages+0xf4/0x130
[ 209.731481] [<ffffffff8118b88b>] __alloc_pages_nodemask+0x8ab/0xe80
[ 209.733322] [<ffffffff811e0c82>] alloc_pages_current+0x92/0x190
[ 209.735077] [<ffffffff8117eef6>] __page_cache_alloc+0x146/0x180
[ 209.736942] [<ffffffff81192061>] __do_page_cache_readahead+0x111/0x380
[ 209.738924] [<ffffffff811920ba>] ? __do_page_cache_readahead+0x16a/0x380
[ 209.740849] [<ffffffff811809f7>] ? pagecache_get_page+0x27/0x280
[ 209.742584] [<ffffffff81182bb3>] filemap_fault+0x2f3/0x680
[ 209.744241] [<ffffffff8130dccd>] ? xfs_filemap_fault+0x5d/0x1f0
[ 209.745991] [<ffffffff810c7c6d>] ? down_read_nested+0x2d/0x50
[ 209.748244] [<ffffffff8131e7c2>] ? xfs_ilock+0x1e2/0x2d0
[ 209.749849] [<ffffffff8130dcd8>] xfs_filemap_fault+0x68/0x1f0
[ 209.751538] [<ffffffff811b8f1b>] __do_fault+0x6b/0x110
[ 209.753159] [<ffffffff811be802>] handle_mm_fault+0x1252/0x1980
[ 209.754844] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 209.756518] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 209.758136] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 209.759695] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 209.761195] Code: 90 8b 04 25 00 d3 5f ff f6 c4 10 75 f2 c1 e7 18 89 3c 25 10 d3 5f ff 89 f0 09 d0 80 ce 04 83 fe 02 0f 44 c2 89 04 25 00 d3 5f ff <c9> c3 48 8b 05 74 be ca 00 89 55 f4 89 75 f8 89 7d fc ff 90 18
[ 209.765816] NMI backtrace for cpu 2
[ 209.767392] CPU: 2 PID: 5207 Comm: tgid=5207 Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 209.769738] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 209.772854] task: ffff880007860040 ti: ffff88005d7cc000 task.ti: ffff88005d7cc000
[ 209.775300] RIP: 0010:[<ffffffff810d0ab4>] [<ffffffff810d0ab4>] __rwsem_do_wake+0x134/0x150
[ 209.777929] RSP: 0000:ffff88005d7cfc10 EFLAGS: 00000097
[ 209.779555] RAX: ffff8800795dbbc0 RBX: ffff880064645be0 RCX: ffffffff00000000
[ 209.781522] RDX: fffffffe00000001 RSI: 0000000000000001 RDI: ffffffffffffffff
[ 209.783490] RBP: ffff88005d7cfc38 R08: 0000000000000001 R09: 0000000000000000
[ 209.785533] R10: ffff880007860040 R11: ffff8800078608d8 R12: ffff880064645be0
[ 209.787876] R13: 0000000000000282 R14: ffff8800738ed068 R15: ffffea00016cb680
[ 209.789858] FS: 00007fd2d1c82740(0000) GS:ffff88007f880000(0000) knlGS:0000000000000000
[ 209.792025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 209.793751] CR2: 00007f7252d7b540 CR3: 000000007a495000 CR4: 00000000001406e0
[ 209.795772] Stack:
[ 209.796839] ffff880064645be0 ffff880064645bf8 0000000000000282 ffff8800738ed068
[ 209.798923] ffffea00016cb680 ffff88005d7cfc68 ffffffff810d0b18 ffff880064645be0
[ 209.800992] ffff880078d6db40 ffff88005d7cfdb0 ffff8800738ed068 ffff88005d7cfcb0
[ 209.803058] Call Trace:
[ 209.804220] [<ffffffff810d0b18>] rwsem_wake+0x48/0xb0
[ 209.806152] [<ffffffff813bc99b>] call_rwsem_wake+0x1b/0x30
[ 209.808068] [<ffffffff810c7d80>] up_read+0x30/0x40
[ 209.809769] [<ffffffff8118283c>] __lock_page_or_retry+0x6c/0xf0
[ 209.811709] [<ffffffff81182a5c>] filemap_fault+0x19c/0x680
[ 209.814493] [<ffffffff81182bc8>] ? filemap_fault+0x308/0x680
[ 209.816282] [<ffffffff8130dcd8>] xfs_filemap_fault+0x68/0x1f0
[ 209.818121] [<ffffffff811b8f1b>] __do_fault+0x6b/0x110
[ 209.819798] [<ffffffff811be802>] handle_mm_fault+0x1252/0x1980
[ 209.821614] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 209.823405] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 209.825154] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 209.827191] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 209.828843] Code: a1 48 8b 78 10 e8 ad 06 fd ff 5b 4c 89 e0 41 5c 41 5d 41 5e 41 5f 5d c3 48 89 c2 31 f6 e9 43 ff ff ff 48 89 fa f0 49 0f c1 14 24 <83> ea 01 0f 84 15 ff ff ff e9 e3 fe ff ff 66 66 66 66 66 2e 0f
----------------------------------------
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: x86_64 Question: Are concurrent IPI requests safe?
@ 2016-05-11 15:19 ` Tetsuo Handa
0 siblings, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2016-05-11 15:19 UTC (permalink / raw)
To: tglx; +Cc: peterz, mingo, akpm, mgorman, mhocko, linux-kernel, linux-mm
Thomas Gleixner wrote:
> On Wed, 11 May 2016, Tetsuo Handa wrote:
> > Peter Zijlstra wrote:
> > > On Wed, May 11, 2016 at 10:19:16PM +0900, Tetsuo Handa wrote:
> > > > [ 180.434659] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
> > >
> > > can you reproduce on real hardware?
> > >
> > Unfortunately, I don't have a real hardware to run development kernels.
> >
> > My Linux environment is limited to 4 CPUs / 1024MB or 2048MB RAM running
> > as a VMware guest on Windows. Can somebody try KVM environment with
> > 4 CPUs / 1024MB or 2048MB RAM whith partition only plain /dev/sda1
> > formatted as XFS?
>
> Can you trigger a back trace on all cpus when the watchdog triggers?
>
"a back trace on all cpus" means SysRq-l ? Then, here it is.
We can see a CPU with EFLAGS: 00000082 or EFLAGS: 00000097 which is at
__rwsem_do_wake().
Well, I came to feel that this is caused by down_write_killable() bug.
I guess we should fix down_write_killable() bug first.
( Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160511-2.txt.xz . )
----------------------------------------
[ 164.412391] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [tgid=5368:5372]
[ 164.412397] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [nmbd:3765]
(...snipped...)
[ 174.740085] sysrq: SysRq : Show backtrace of all active CPUs
[ 174.741969] Sending NMI to all CPUs:
[ 174.743474] NMI backtrace for cpu 0
[ 174.744834] CPU: 0 PID: 5313 Comm: tgid=5309 Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 174.747223] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 174.749993] task: ffff880073e91940 ti: ffff880079b88000 task.ti: ffff880079b88000
[ 174.752321] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
[ 174.754804] RSP: 0000:ffff880079b8bbb8 EFLAGS: 00000202
[ 174.756528] RAX: 0000000000000002 RBX: ffff88007f818880 RCX: ffff88007f89c238
[ 174.758673] RDX: 0000000000000002 RSI: 0000000000000008 RDI: ffff88007f818888
[ 174.761174] RBP: ffff880079b8bbf8 R08: 000000000000000c R09: 0000000000000000
[ 174.763583] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
[ 174.765616] R13: 0000000000000008 R14: ffffffff81188420 R15: 0000000000000000
[ 174.767622] FS: 00007fd2d1c82740(0000) GS:ffff88007f800000(0000) knlGS:0000000000000000
[ 174.769922] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 174.771744] CR2: 00007f1069816541 CR3: 00000000657ec000 CR4: 00000000001406f0
[ 174.774472] Stack:
[ 174.775576] 0000000000018840 0100000000000001 ffff88007f818888 ffffffff82b50168
[ 174.777672] 0000000000000000 ffffffff81188420 0000000000000000 0000000000000001
[ 174.779768] ffff880079b8bc30 ffffffff8111079d 0000000000000008 0000000000000000
[ 174.781996] Call Trace:
[ 174.783490] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 174.785348] [<ffffffff8111079d>] on_each_cpu_mask+0x3d/0xc0
[ 174.787503] [<ffffffff81189234>] drain_all_pages+0xf4/0x130
[ 174.789379] [<ffffffff8118b88b>] __alloc_pages_nodemask+0x8ab/0xe80
[ 174.791292] [<ffffffff811e3259>] alloc_pages_vma+0xb9/0x2b0
[ 174.793399] [<ffffffff811be9e0>] handle_mm_fault+0x1430/0x1980
[ 174.795189] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 174.797012] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 174.798779] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 174.800461] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 174.802102] Code: d2 e8 26 33 2b 00 3b 05 a4 26 be 00 41 89 c4 0f 8d 6b fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 20 14 cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 83 f8 ff 74 ba 83 f8 07 76 b5 80 3d 70 76 bc
[ 174.807003] NMI backtrace for cpu 3
[ 174.808419] CPU: 3 PID: 3765 Comm: nmbd Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 174.810860] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 174.813685] task: ffff880079c81940 ti: ffff880078d38000 task.ti: ffff880078d38000
[ 174.815857] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
[ 174.818324] RSP: 0000:ffff880078d3ba88 EFLAGS: 00000202
[ 174.820163] RAX: 0000000000000002 RBX: ffff88007f8d8880 RCX: ffff88007f89d8f8
[ 174.822353] RDX: 0000000000000002 RSI: 0000000000000008 RDI: ffff88007f8d8888
[ 174.824503] RBP: ffff880078d3bac8 R08: 0000000000000005 R09: 0000000000000000
[ 174.826634] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
[ 174.828762] R13: 0000000000000008 R14: ffffffff81188420 R15: 0000000000000000
[ 174.830885] FS: 00007f72564d5840(0000) GS:ffff88007f8c0000(0000) knlGS:0000000000000000
[ 174.833195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 174.835061] CR2: 00007f0d7c0d7110 CR3: 000000007925d000 CR4: 00000000001406e0
[ 174.837160] Stack:
[ 174.838307] 0000000000018840 0100000000000001 ffff88007f8d8888 ffffffff82b50168
[ 174.840472] 0000000000000003 ffffffff81188420 0000000000000000 0000000000000001
[ 174.842625] ffff880078d3bb00 ffffffff8111079d 0000000000000008 0000000000000000
[ 174.844762] Call Trace:
[ 174.845987] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 174.847900] [<ffffffff8111079d>] on_each_cpu_mask+0x3d/0xc0
[ 174.849686] [<ffffffff81189234>] drain_all_pages+0xf4/0x130
[ 174.851483] [<ffffffff8118b88b>] __alloc_pages_nodemask+0x8ab/0xe80
[ 174.853646] [<ffffffff811e0c82>] alloc_pages_current+0x92/0x190
[ 174.855497] [<ffffffff8117eef6>] __page_cache_alloc+0x146/0x180
[ 174.857860] [<ffffffff811809f7>] ? pagecache_get_page+0x27/0x280
[ 174.859715] [<ffffffff81182d02>] filemap_fault+0x442/0x680
[ 174.861498] [<ffffffff81182bc8>] ? filemap_fault+0x308/0x680
[ 174.863624] [<ffffffff8130dcd8>] xfs_filemap_fault+0x68/0x1f0
[ 174.865439] [<ffffffff811b8f1b>] __do_fault+0x6b/0x110
[ 174.867120] [<ffffffff811be802>] handle_mm_fault+0x1252/0x1980
[ 174.868947] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 174.870763] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 174.872540] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 174.874684] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 174.876354] Code: d2 e8 26 33 2b 00 3b 05 a4 26 be 00 41 89 c4 0f 8d 6b fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 20 14 cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 83 f8 ff 74 ba 83 f8 07 76 b5 80 3d 70 76 bc
[ 174.881275] NMI backtrace for cpu 1
[ 174.882668] CPU: 1 PID: 5372 Comm: tgid=5368 Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 174.885226] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 174.888341] task: ffff8800038c6440 ti: ffff880003af8000 task.ti: ffff880003af8000
[ 174.890688] RIP: 0010:[<ffffffff8104594b>] [<ffffffff8104594b>] __default_send_IPI_dest_field+0x3b/0x60
[ 174.893307] RSP: 0000:ffff88007f843b90 EFLAGS: 00000046
[ 174.895106] RAX: 0000000000000400 RBX: 0000000000000001 RCX: 0000000000000001
[ 174.897232] RDX: 0000000000000400 RSI: 0000000000000002 RDI: 0000000002000000
[ 174.899502] RBP: ffff88007f843ba0 R08: 000000000000000f R09: 0000000000000000
[ 174.901658] R10: 0000000000000001 R11: 0000000000001d1e R12: 000000000000a12e
[ 174.904146] R13: ffffffff81cf59a0 R14: 0000000000000002 R15: 0000000000000082
[ 174.906259] FS: 00007fd2d1c82740(0000) GS:ffff88007f840000(0000) knlGS:0000000000000000
[ 174.908544] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 174.910382] CR2: 00007fd2d176c480 CR3: 0000000005f64000 CR4: 00000000001406e0
[ 174.912446] Stack:
[ 174.913860] 000000007f843ba0 0000000200000002 ffff88007f843be0 ffffffff81045a37
[ 174.915947] ffff88007f843bc0 0000000000000001 0000000000000001 ffffffff81046b10
[ 174.918031] 0000000000000007 ffff88007ace3100 ffff88007f843bf0 ffffffff81046b26
[ 174.920220] Call Trace:
[ 174.921448] <IRQ> \x01d [<ffffffff81045a37>] default_send_IPI_mask_sequence_phys+0x67/0xe0
[ 174.923677] [<ffffffff81046b10>] ? irq_force_complete_move+0x100/0x100
[ 174.926005] [<ffffffff81046b26>] nmi_raise_cpu_backtrace+0x16/0x20
[ 174.928079] [<ffffffff813b3104>] nmi_trigger_all_cpu_backtrace+0x64/0xfc
[ 174.930088] [<ffffffff81046b84>] arch_trigger_all_cpu_backtrace+0x14/0x20
[ 174.932193] [<ffffffff8146a5ee>] sysrq_handle_showallcpus+0xe/0x10
[ 174.934125] [<ffffffff8146ad16>] __handle_sysrq+0x136/0x220
[ 174.935914] [<ffffffff8146abe0>] ? __sysrq_get_key_op+0x30/0x30
[ 174.937753] [<ffffffff8146b19e>] sysrq_filter+0x36e/0x3b0
[ 174.939497] [<ffffffff81566172>] input_to_handler+0x52/0x100
[ 174.941283] [<ffffffff81568def>] input_pass_values.part.5+0x1ff/0x280
[ 174.943228] [<ffffffff81568bf0>] ? input_register_handler+0xc0/0xc0
[ 174.945131] [<ffffffff815695a4>] input_handle_event+0x114/0x4e0
[ 174.946947] [<ffffffff815699bf>] input_event+0x4f/0x70
[ 174.948638] [<ffffffff8157113d>] atkbd_interrupt+0x5bd/0x6a0
[ 174.950409] [<ffffffff81563531>] serio_interrupt+0x41/0x80
[ 174.952183] [<ffffffff8156412a>] i8042_interrupt+0x1da/0x3a0
[ 174.954407] [<ffffffff810e0949>] handle_irq_event_percpu+0x69/0x470
[ 174.956320] [<ffffffff810e0d84>] handle_irq_event+0x34/0x60
[ 174.958205] [<ffffffff810e41d3>] handle_edge_irq+0xa3/0x160
[ 174.960187] [<ffffffff8101f647>] handle_irq+0xf7/0x170
[ 174.961870] [<ffffffff8101eaca>] do_IRQ+0x6a/0x130
[ 174.963462] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 174.969241] [<ffffffff8172b593>] common_interrupt+0x93/0x93
[ 174.971035] <EOI> \x01d [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 174.973042] [<ffffffff811105bf>] ? smp_call_function_many+0x21f/0x2c0
[ 174.975222] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 174.977089] [<ffffffff8111079d>] on_each_cpu_mask+0x3d/0xc0
[ 174.979940] [<ffffffff81189234>] drain_all_pages+0xf4/0x130
[ 174.981739] [<ffffffff8118b88b>] __alloc_pages_nodemask+0x8ab/0xe80
[ 174.983945] [<ffffffff811e0c82>] alloc_pages_current+0x92/0x190
[ 174.985740] [<ffffffff8117eef6>] __page_cache_alloc+0x146/0x180
[ 174.987576] [<ffffffff81192061>] __do_page_cache_readahead+0x111/0x380
[ 174.989462] [<ffffffff811920ba>] ? __do_page_cache_readahead+0x16a/0x380
[ 174.991327] [<ffffffff811809f7>] ? pagecache_get_page+0x27/0x280
[ 174.993432] [<ffffffff81182bb3>] filemap_fault+0x2f3/0x680
[ 174.995088] [<ffffffff8130dccd>] ? xfs_filemap_fault+0x5d/0x1f0
[ 174.996808] [<ffffffff810c7c6d>] ? down_read_nested+0x2d/0x50
[ 174.998586] [<ffffffff8131e7c2>] ? xfs_ilock+0x1e2/0x2d0
[ 175.000279] [<ffffffff8130dcd8>] xfs_filemap_fault+0x68/0x1f0
[ 175.002649] [<ffffffff811b8f1b>] __do_fault+0x6b/0x110
[ 175.004203] [<ffffffff811be802>] handle_mm_fault+0x1252/0x1980
[ 175.005888] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 175.007575] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 175.009203] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 175.010840] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 175.012346] Code: 90 8b 04 25 00 d3 5f ff f6 c4 10 75 f2 c1 e7 18 89 3c 25 10 d3 5f ff 89 f0 09 d0 80 ce 04 83 fe 02 0f 44 c2 89 04 25 00 d3 5f ff <c9> c3 48 8b 05 74 be ca 00 89 55 f4 89 75 f8 89 7d fc ff 90 18
[ 175.017059] NMI backtrace for cpu 2
[ 175.018546] CPU: 2 PID: 5207 Comm: tgid=5207 Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 175.021220] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 175.025744] task: ffff880007860040 ti: ffff88005d7cc000 task.ti: ffff88005d7cc000
[ 175.027749] RIP: 0010:[<ffffffff810d09db>] [<ffffffff810d09db>] __rwsem_do_wake+0x5b/0x150
[ 175.029917] RSP: 0000:ffff88005d7cfc10 EFLAGS: 00000082
[ 175.031562] RAX: ffff8800795dbbc0 RBX: ffff880064645be0 RCX: ffffffff00000000
[ 175.033558] RDX: fffffffe00000000 RSI: 0000000000000001 RDI: ffffffffffffffff
[ 175.035618] RBP: ffff88005d7cfc38 R08: 0000000000000001 R09: 0000000000000000
[ 175.037932] R10: ffff880007860040 R11: ffff8800078608d8 R12: ffff880064645be0
[ 175.039920] R13: 0000000000000282 R14: ffff8800738ed068 R15: ffffea00016cb680
[ 175.041979] FS: 00007fd2d1c82740(0000) GS:ffff88007f880000(0000) knlGS:0000000000000000
[ 175.044474] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 175.046204] CR2: 00007f7252d7b540 CR3: 000000007a495000 CR4: 00000000001406e0
[ 175.048219] Stack:
[ 175.049286] ffff880064645be0 ffff880064645bf8 0000000000000282 ffff8800738ed068
[ 175.051392] ffffea00016cb680 ffff88005d7cfc68 ffffffff810d0b18 ffff880064645be0
[ 175.053477] ffff880078d6db40 ffff88005d7cfdb0 ffff8800738ed068 ffff88005d7cfcb0
[ 175.055587] Call Trace:
[ 175.056745] [<ffffffff810d0b18>] rwsem_wake+0x48/0xb0
[ 175.058402] [<ffffffff813bc99b>] call_rwsem_wake+0x1b/0x30
[ 175.060146] [<ffffffff810c7d80>] up_read+0x30/0x40
[ 175.061774] [<ffffffff8118283c>] __lock_page_or_retry+0x6c/0xf0
[ 175.063597] [<ffffffff81182a5c>] filemap_fault+0x19c/0x680
[ 175.065324] [<ffffffff81182bc8>] ? filemap_fault+0x308/0x680
[ 175.067080] [<ffffffff8130dcd8>] xfs_filemap_fault+0x68/0x1f0
[ 175.069002] [<ffffffff811b8f1b>] __do_fault+0x6b/0x110
[ 175.070705] [<ffffffff811be802>] handle_mm_fault+0x1252/0x1980
[ 175.072513] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 175.074323] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 175.076093] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 175.077784] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 175.079423] Code: 5f 5d c3 83 fe 02 0f 84 e5 00 00 00 be 01 00 00 00 48 b9 00 00 00 00 ff ff ff ff 48 c7 c7 ff ff ff ff 48 89 f2 f0 49 0f c1 14 24 <48> 39 ca 0f 8c c7 00 00 00 49 8b 54 24 08 be 01 00 00 00 4d 8d
(...snipped...)
[ 209.471203] sysrq: SysRq : Show backtrace of all active CPUs
[ 209.473421] Sending NMI to all CPUs:
[ 209.475059] NMI backtrace for cpu 0
[ 209.476875] CPU: 0 PID: 5313 Comm: tgid=5309 Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 209.479592] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 209.482566] task: ffff880073e91940 ti: ffff880079b88000 task.ti: ffff880079b88000
[ 209.484837] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
[ 209.487465] RSP: 0000:ffff880079b8bbb8 EFLAGS: 00000202
[ 209.489353] RAX: 0000000000000002 RBX: ffff88007f818880 RCX: ffff88007f89c238
[ 209.491539] RDX: 0000000000000002 RSI: 0000000000000008 RDI: ffff88007f818888
[ 209.493706] RBP: ffff880079b8bbf8 R08: 000000000000000c R09: 0000000000000000
[ 209.495889] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
[ 209.498366] R13: 0000000000000008 R14: ffffffff81188420 R15: 0000000000000000
[ 209.500459] FS: 00007fd2d1c82740(0000) GS:ffff88007f800000(0000) knlGS:0000000000000000
[ 209.502701] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 209.504539] CR2: 00007f1069816541 CR3: 00000000657ec000 CR4: 00000000001406f0
[ 209.508705] Stack:
[ 209.510195] 0000000000018840 0100000000000001 ffff88007f818888 ffffffff82b50168
[ 209.512492] 0000000000000000 ffffffff81188420 0000000000000000 0000000000000001
[ 209.514658] ffff880079b8bc30 ffffffff8111079d 0000000000000008 0000000000000000
[ 209.516853] Call Trace:
[ 209.518138] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 209.520060] [<ffffffff8111079d>] on_each_cpu_mask+0x3d/0xc0
[ 209.521855] [<ffffffff81189234>] drain_all_pages+0xf4/0x130
[ 209.523842] [<ffffffff8118b88b>] __alloc_pages_nodemask+0x8ab/0xe80
[ 209.525752] [<ffffffff811e3259>] alloc_pages_vma+0xb9/0x2b0
[ 209.527587] [<ffffffff811be9e0>] handle_mm_fault+0x1430/0x1980
[ 209.529484] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 209.531536] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 209.533310] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 209.535023] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 209.536666] Code: d2 e8 26 33 2b 00 3b 05 a4 26 be 00 41 89 c4 0f 8d 6b fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 20 14 cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 83 f8 ff 74 ba 83 f8 07 76 b5 80 3d 70 76 bc
[ 209.542321] NMI backtrace for cpu 3
[ 209.543740] CPU: 3 PID: 3765 Comm: nmbd Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 209.546090] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 209.549111] task: ffff880079c81940 ti: ffff880078d38000 task.ti: ffff880078d38000
[ 209.551336] RIP: 0010:[<ffffffff811105bf>] [<ffffffff811105bf>] smp_call_function_many+0x21f/0x2c0
[ 209.553848] RSP: 0000:ffff880078d3ba88 EFLAGS: 00000202
[ 209.555668] RAX: 0000000000000002 RBX: ffff88007f8d8880 RCX: ffff88007f89d8f8
[ 209.558128] RDX: 0000000000000002 RSI: 0000000000000008 RDI: ffff88007f8d8888
[ 209.561152] RBP: ffff880078d3bac8 R08: 0000000000000005 R09: 0000000000000000
[ 209.563334] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
[ 209.565526] R13: 0000000000000008 R14: ffffffff81188420 R15: 0000000000000000
[ 209.567984] FS: 00007f72564d5840(0000) GS:ffff88007f8c0000(0000) knlGS:0000000000000000
[ 209.570302] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 209.573204] CR2: 00007f0d7c0d7110 CR3: 000000007925d000 CR4: 00000000001406e0
[ 209.575365] Stack:
[ 209.576556] 0000000000018840 0100000000000001 ffff88007f8d8888 ffffffff82b50168
[ 209.578709] 0000000000000003 ffffffff81188420 0000000000000000 0000000000000001
[ 209.580948] ffff880078d3bb00 ffffffff8111079d 0000000000000008 0000000000000000
[ 209.583196] Call Trace:
[ 209.584428] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 209.586703] [<ffffffff8111079d>] on_each_cpu_mask+0x3d/0xc0
[ 209.588513] [<ffffffff81189234>] drain_all_pages+0xf4/0x130
[ 209.590445] [<ffffffff8118b88b>] __alloc_pages_nodemask+0x8ab/0xe80
[ 209.592459] [<ffffffff811e0c82>] alloc_pages_current+0x92/0x190
[ 209.595874] [<ffffffff8117eef6>] __page_cache_alloc+0x146/0x180
[ 209.597738] [<ffffffff811809f7>] ? pagecache_get_page+0x27/0x280
[ 209.599601] [<ffffffff81182d02>] filemap_fault+0x442/0x680
[ 209.601363] [<ffffffff81182bc8>] ? filemap_fault+0x308/0x680
[ 209.603141] [<ffffffff8130dcd8>] xfs_filemap_fault+0x68/0x1f0
[ 209.604944] [<ffffffff811b8f1b>] __do_fault+0x6b/0x110
[ 209.607422] [<ffffffff811be802>] handle_mm_fault+0x1252/0x1980
[ 209.609281] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 209.611068] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 209.612826] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 209.614518] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 209.616155] Code: d2 e8 26 33 2b 00 3b 05 a4 26 be 00 41 89 c4 0f 8d 6b fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 20 14 cf 81 f6 41 18 01 74 08 f3 90 <f6> 41 18 01 75 f8 83 f8 ff 74 ba 83 f8 07 76 b5 80 3d 70 76 bc
[ 209.621077] NMI backtrace for cpu 1
[ 209.622473] CPU: 1 PID: 5372 Comm: tgid=5368 Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 209.624907] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 209.628064] task: ffff8800038c6440 ti: ffff880003af8000 task.ti: ffff880003af8000
[ 209.630234] RIP: 0010:[<ffffffff8104594b>] [<ffffffff8104594b>] __default_send_IPI_dest_field+0x3b/0x60
[ 209.632774] RSP: 0000:ffff88007f843b90 EFLAGS: 00000046
[ 209.634554] RAX: 0000000000000400 RBX: 0000000000000001 RCX: 0000000000000001
[ 209.636682] RDX: 0000000000000400 RSI: 0000000000000002 RDI: 0000000002000000
[ 209.639148] RBP: ffff88007f843ba0 R08: 000000000000000f R09: 0000000000000000
[ 209.641495] R10: 0000000000000001 R11: 0000000000001e88 R12: 000000000000a12e
[ 209.643584] R13: ffffffff81cf59a0 R14: 0000000000000002 R15: 0000000000000082
[ 209.645672] FS: 00007fd2d1c82740(0000) GS:ffff88007f840000(0000) knlGS:0000000000000000
[ 209.648320] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 209.650338] CR2: 00007fd2d176c480 CR3: 0000000005f64000 CR4: 00000000001406e0
[ 209.656131] Stack:
[ 209.658919] 000000007f843ba0 0000000200000002 ffff88007f843be0 ffffffff81045a37
[ 209.665787] ffff88007f843bc0 0000000000000001 0000000000000001 ffffffff81046b10
[ 209.671430] 0000000000000007 ffff88007ace3100 ffff88007f843bf0 ffffffff81046b26
[ 209.675269] Call Trace:
[ 209.676537] <IRQ> \x01d [<ffffffff81045a37>] default_send_IPI_mask_sequence_phys+0x67/0xe0
[ 209.678825] [<ffffffff81046b10>] ? irq_force_complete_move+0x100/0x100
[ 209.680819] [<ffffffff81046b26>] nmi_raise_cpu_backtrace+0x16/0x20
[ 209.682741] [<ffffffff813b3104>] nmi_trigger_all_cpu_backtrace+0x64/0xfc
[ 209.684727] [<ffffffff81046b84>] arch_trigger_all_cpu_backtrace+0x14/0x20
[ 209.686744] [<ffffffff8146a5ee>] sysrq_handle_showallcpus+0xe/0x10
[ 209.689182] [<ffffffff8146ad16>] __handle_sysrq+0x136/0x220
[ 209.691013] [<ffffffff8146abe0>] ? __sysrq_get_key_op+0x30/0x30
[ 209.692861] [<ffffffff8146b19e>] sysrq_filter+0x36e/0x3b0
[ 209.694591] [<ffffffff81566172>] input_to_handler+0x52/0x100
[ 209.696354] [<ffffffff81568def>] input_pass_values.part.5+0x1ff/0x280
[ 209.698259] [<ffffffff81568bf0>] ? input_register_handler+0xc0/0xc0
[ 209.700143] [<ffffffff815695a4>] input_handle_event+0x114/0x4e0
[ 209.701948] [<ffffffff815699bf>] input_event+0x4f/0x70
[ 209.703670] [<ffffffff8157113d>] atkbd_interrupt+0x5bd/0x6a0
[ 209.705451] [<ffffffff81563531>] serio_interrupt+0x41/0x80
[ 209.707256] [<ffffffff8156412a>] i8042_interrupt+0x1da/0x3a0
[ 209.709449] [<ffffffff810e0949>] handle_irq_event_percpu+0x69/0x470
[ 209.711598] [<ffffffff810e0d84>] handle_irq_event+0x34/0x60
[ 209.713558] [<ffffffff810e41d3>] handle_edge_irq+0xa3/0x160
[ 209.715308] [<ffffffff8101f647>] handle_irq+0xf7/0x170
[ 209.716966] [<ffffffff8101eaca>] do_IRQ+0x6a/0x130
[ 209.718558] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 209.720407] [<ffffffff8172b593>] common_interrupt+0x93/0x93
[ 209.722147] <EOI> \x01d [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 209.724180] [<ffffffff811105bf>] ? smp_call_function_many+0x21f/0x2c0
[ 209.726093] [<ffffffff81188420>] ? page_alloc_cpu_notify+0x40/0x40
[ 209.728009] [<ffffffff8111079d>] on_each_cpu_mask+0x3d/0xc0
[ 209.729760] [<ffffffff81189234>] drain_all_pages+0xf4/0x130
[ 209.731481] [<ffffffff8118b88b>] __alloc_pages_nodemask+0x8ab/0xe80
[ 209.733322] [<ffffffff811e0c82>] alloc_pages_current+0x92/0x190
[ 209.735077] [<ffffffff8117eef6>] __page_cache_alloc+0x146/0x180
[ 209.736942] [<ffffffff81192061>] __do_page_cache_readahead+0x111/0x380
[ 209.738924] [<ffffffff811920ba>] ? __do_page_cache_readahead+0x16a/0x380
[ 209.740849] [<ffffffff811809f7>] ? pagecache_get_page+0x27/0x280
[ 209.742584] [<ffffffff81182bb3>] filemap_fault+0x2f3/0x680
[ 209.744241] [<ffffffff8130dccd>] ? xfs_filemap_fault+0x5d/0x1f0
[ 209.745991] [<ffffffff810c7c6d>] ? down_read_nested+0x2d/0x50
[ 209.748244] [<ffffffff8131e7c2>] ? xfs_ilock+0x1e2/0x2d0
[ 209.749849] [<ffffffff8130dcd8>] xfs_filemap_fault+0x68/0x1f0
[ 209.751538] [<ffffffff811b8f1b>] __do_fault+0x6b/0x110
[ 209.753159] [<ffffffff811be802>] handle_mm_fault+0x1252/0x1980
[ 209.754844] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 209.756518] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 209.758136] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 209.759695] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 209.761195] Code: 90 8b 04 25 00 d3 5f ff f6 c4 10 75 f2 c1 e7 18 89 3c 25 10 d3 5f ff 89 f0 09 d0 80 ce 04 83 fe 02 0f 44 c2 89 04 25 00 d3 5f ff <c9> c3 48 8b 05 74 be ca 00 89 55 f4 89 75 f8 89 7d fc ff 90 18
[ 209.765816] NMI backtrace for cpu 2
[ 209.767392] CPU: 2 PID: 5207 Comm: tgid=5207 Tainted: G L 4.6.0-rc7-next-20160511 #426
[ 209.769738] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 209.772854] task: ffff880007860040 ti: ffff88005d7cc000 task.ti: ffff88005d7cc000
[ 209.775300] RIP: 0010:[<ffffffff810d0ab4>] [<ffffffff810d0ab4>] __rwsem_do_wake+0x134/0x150
[ 209.777929] RSP: 0000:ffff88005d7cfc10 EFLAGS: 00000097
[ 209.779555] RAX: ffff8800795dbbc0 RBX: ffff880064645be0 RCX: ffffffff00000000
[ 209.781522] RDX: fffffffe00000001 RSI: 0000000000000001 RDI: ffffffffffffffff
[ 209.783490] RBP: ffff88005d7cfc38 R08: 0000000000000001 R09: 0000000000000000
[ 209.785533] R10: ffff880007860040 R11: ffff8800078608d8 R12: ffff880064645be0
[ 209.787876] R13: 0000000000000282 R14: ffff8800738ed068 R15: ffffea00016cb680
[ 209.789858] FS: 00007fd2d1c82740(0000) GS:ffff88007f880000(0000) knlGS:0000000000000000
[ 209.792025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 209.793751] CR2: 00007f7252d7b540 CR3: 000000007a495000 CR4: 00000000001406e0
[ 209.795772] Stack:
[ 209.796839] ffff880064645be0 ffff880064645bf8 0000000000000282 ffff8800738ed068
[ 209.798923] ffffea00016cb680 ffff88005d7cfc68 ffffffff810d0b18 ffff880064645be0
[ 209.800992] ffff880078d6db40 ffff88005d7cfdb0 ffff8800738ed068 ffff88005d7cfcb0
[ 209.803058] Call Trace:
[ 209.804220] [<ffffffff810d0b18>] rwsem_wake+0x48/0xb0
[ 209.806152] [<ffffffff813bc99b>] call_rwsem_wake+0x1b/0x30
[ 209.808068] [<ffffffff810c7d80>] up_read+0x30/0x40
[ 209.809769] [<ffffffff8118283c>] __lock_page_or_retry+0x6c/0xf0
[ 209.811709] [<ffffffff81182a5c>] filemap_fault+0x19c/0x680
[ 209.814493] [<ffffffff81182bc8>] ? filemap_fault+0x308/0x680
[ 209.816282] [<ffffffff8130dcd8>] xfs_filemap_fault+0x68/0x1f0
[ 209.818121] [<ffffffff811b8f1b>] __do_fault+0x6b/0x110
[ 209.819798] [<ffffffff811be802>] handle_mm_fault+0x1252/0x1980
[ 209.821614] [<ffffffff811bd5f8>] ? handle_mm_fault+0x48/0x1980
[ 209.823405] [<ffffffff8105432a>] __do_page_fault+0x1aa/0x530
[ 209.825154] [<ffffffff810546d1>] do_page_fault+0x21/0x70
[ 209.827191] [<ffffffff8172cb52>] page_fault+0x22/0x30
[ 209.828843] Code: a1 48 8b 78 10 e8 ad 06 fd ff 5b 4c 89 e0 41 5c 41 5d 41 5e 41 5f 5d c3 48 89 c2 31 f6 e9 43 ff ff ff 48 89 fa f0 49 0f c1 14 24 <83> ea 01 0f 84 15 ff ff ff e9 e3 fe ff ff 66 66 66 66 66 2e 0f
----------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: x86_64 Question: Are concurrent IPI requests safe?
2016-05-11 15:19 ` Tetsuo Handa
@ 2016-05-11 17:46 ` Peter Zijlstra
-1 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2016-05-11 17:46 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: tglx, mingo, akpm, mgorman, mhocko, linux-kernel, linux-mm
On Thu, May 12, 2016 at 12:19:07AM +0900, Tetsuo Handa wrote:
> Well, I came to feel that this is caused by down_write_killable() bug.
> I guess we should fix down_write_killable() bug first.
There's a patch you can try in that thread...
lkml.kernel.org/r/20160511094128.GB3190@twins.programming.kicks-ass.net
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: x86_64 Question: Are concurrent IPI requests safe?
@ 2016-05-11 17:46 ` Peter Zijlstra
0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2016-05-11 17:46 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: tglx, mingo, akpm, mgorman, mhocko, linux-kernel, linux-mm
On Thu, May 12, 2016 at 12:19:07AM +0900, Tetsuo Handa wrote:
> Well, I came to feel that this is caused by down_write_killable() bug.
> I guess we should fix down_write_killable() bug first.
There's a patch you can try in that thread...
lkml.kernel.org/r/20160511094128.GB3190@twins.programming.kicks-ass.net
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: x86_64 Question: Are concurrent IPI requests safe?
2016-05-11 17:46 ` Peter Zijlstra
@ 2016-05-11 21:56 ` Tetsuo Handa
-1 siblings, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2016-05-11 21:56 UTC (permalink / raw)
To: peterz; +Cc: tglx, mingo, akpm, mgorman, mhocko, linux-kernel, linux-mm
Peter Zijlstra wrote:
> On Thu, May 12, 2016 at 12:19:07AM +0900, Tetsuo Handa wrote:
> > Well, I came to feel that this is caused by down_write_killable() bug.
> > I guess we should fix down_write_killable() bug first.
>
> There's a patch you can try in that thread...
>
> lkml.kernel.org/r/20160511094128.GB3190@twins.programming.kicks-ass.net
>
OK. Applying that patch on next-20160511 seems to fix this problem.
Please send it to linux-next tree. Thank you.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: x86_64 Question: Are concurrent IPI requests safe?
@ 2016-05-11 21:56 ` Tetsuo Handa
0 siblings, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2016-05-11 21:56 UTC (permalink / raw)
To: peterz; +Cc: tglx, mingo, akpm, mgorman, mhocko, linux-kernel, linux-mm
Peter Zijlstra wrote:
> On Thu, May 12, 2016 at 12:19:07AM +0900, Tetsuo Handa wrote:
> > Well, I came to feel that this is caused by down_write_killable() bug.
> > I guess we should fix down_write_killable() bug first.
>
> There's a patch you can try in that thread...
>
> lkml.kernel.org/r/20160511094128.GB3190@twins.programming.kicks-ass.net
>
OK. Applying that patch on next-20160511 seems to fix this problem.
Please send it to linux-next tree. Thank you.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2016-05-11 21:56 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-06 10:58 [next-20160506 mm,smp] hang up at csd_lock_wait() from drain_all_pages() Tetsuo Handa
2016-05-06 10:58 ` Tetsuo Handa
2016-05-09 14:54 ` x86_64 Question: Are concurrent IPI requests safe? Tetsuo Handa
2016-05-09 14:54 ` Tetsuo Handa
2016-05-09 16:54 ` Thomas Gleixner
2016-05-11 13:19 ` Tetsuo Handa
2016-05-11 13:19 ` Tetsuo Handa
2016-05-11 13:39 ` Peter Zijlstra
2016-05-11 13:39 ` Peter Zijlstra
2016-05-11 14:09 ` Tetsuo Handa
2016-05-11 14:09 ` Tetsuo Handa
2016-05-11 14:33 ` Thomas Gleixner
2016-05-11 15:19 ` Tetsuo Handa
2016-05-11 15:19 ` Tetsuo Handa
2016-05-11 17:46 ` Peter Zijlstra
2016-05-11 17:46 ` Peter Zijlstra
2016-05-11 21:56 ` Tetsuo Handa
2016-05-11 21:56 ` Tetsuo Handa
2016-05-11 14:13 ` Thomas Gleixner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.