* mm: hangs in free_pages_prepare
@ 2015-03-07 18:38 Sasha Levin
2015-03-08 20:38 ` Michal Hocko
0 siblings, 1 reply; 5+ messages in thread
From: Sasha Levin @ 2015-03-07 18:38 UTC (permalink / raw)
To: linux-mm; +Cc: akpm >> Andrew Morton
Hi all,
I've disabled preemption on my fuzzing setup, and am hitting lots of the following, which
go back all the way to <3.18:
[ 1573.730097] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 22s! [trinity-c42:27057]
[ 1573.730097] Modules linked in:
[ 1573.730097] irq event stamp: 13148952
[ 1573.730097] hardirqs last enabled at (13148951): free_hot_cold_page (./arch/x86/include/asm/paravirt.h:809 (discriminator 2) mm/page_alloc.c:1579 (discriminator 2))
[ 1573.730097] hardirqs last disabled at (13148952): apic_timer_interrupt (arch/x86/kernel/entry_64.S:961)
[ 1573.730097] softirqs last enabled at (13142584): __do_softirq (./arch/x86/include/asm/preempt.h:22 kernel/softirq.c:300)
[ 1573.730097] softirqs last disabled at (13142579): irq_exit (kernel/softirq.c:350 kernel/softirq.c:391)
[ 1573.730097] CPU: 12 PID: 27057 Comm: trinity-c42 Not tainted 4.0.0-rc2-next-20150306-sasha-00035-g8286417 #2008
[ 1573.730097] task: ffff8803cff28000 ti: ffff8803cf378000 task.ti: ffff8803cf378000
[ 1573.730097] RIP: __memset (arch/x86/lib/memset_64.S:84)
[ 1573.730097] RSP: 0018:ffff8803cf37fcc0 EFLAGS: 00000202
[ 1573.730097] RAX: ffffffffffffffff RBX: ffff880583325000 RCX: 0000000000000001
[ 1573.730097] RDX: 0000000000000200 RSI: 00000000000000ff RDI: ffffed00b0664dc0
[ 1573.730097] RBP: ffff8803cf37fcc8 R08: 1ffff100b0664c00 R09: 0000000000000000
[ 1573.730097] R10: ffffed00b0664c00 R11: 0000000000000000 R12: ffffffffa6504c34
[ 1573.730097] R13: 0000000041b58ab3 R14: ffffffffaa31a488 R15: ffffffffaa31a480
[ 1573.730097] FS: 00007f071dd34700(0000) GS:ffff88024f200000(0000) knlGS:0000000000000000
[ 1573.730097] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1573.730097] CR2: 0000000000000000 CR3: 00000003e4736000 CR4: 00000000000007a0
[ 1573.730097] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1573.730097] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 1573.730097] Stack:
[ 1573.730097] ffffffff997679b9 ffff8803cf37fd48 ffffffff9966b3ea 000000000000026a
[ 1573.730097] 0000000000000000 ffffea00160c9a30 ffffffff99672eb8 ffff8803cff28000
[ 1573.730097] ffff88024f22ddb8 ffffea00160cc960 ffffffffa8c3ba90 ffff8803cf37fd38
[ 1573.730097] Call Trace:
[ 1573.730097] ? kasan_free_pages (mm/kasan/kasan.c:301)
[ 1573.788680] free_pages_prepare (mm/page_alloc.c:791)
[ 1573.788680] ? free_hot_cold_page (./arch/x86/include/asm/paravirt.h:809 (discriminator 2) mm/page_alloc.c:1579 (discriminator 2))
[ 1573.788680] free_hot_cold_page (mm/page_alloc.c:1543)
[ 1573.788680] __free_pages (mm/page_alloc.c:2957)
[ 1573.788680] ? __vunmap (mm/vmalloc.c:1460 (discriminator 2))
[ 1573.788680] __vunmap (mm/vmalloc.c:1460 (discriminator 2))
[ 1573.788680] vfree (mm/vmalloc.c:1505)
[ 1573.788680] SyS_init_module (kernel/module.c:2503 kernel/module.c:3385)
[ 1573.788680] ? load_module (kernel/module.c:3385)
[ 1573.788680] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1604)
[ 1573.788680] ia32_do_call (arch/x86/ia32/ia32entry.S:446)
[ 1573.788680] Code: b6 ce 48 b8 01 01 01 01 01 01 01 01 48 0f af c1 41 89 f9 41 83 e1 07 75 70 48 89 d1 48 c1 e9 06 74 39 66 0f 1f 84 00 00 00 00 00 <48> ff c9 48 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47
All code
========
0: b6 ce mov $0xce,%dh
2: 48 b8 01 01 01 01 01 movabs $0x101010101010101,%rax
9: 01 01 01
c: 48 0f af c1 imul %rcx,%rax
10: 41 89 f9 mov %edi,%r9d
13: 41 83 e1 07 and $0x7,%r9d
17: 75 70 jne 0x89
19: 48 89 d1 mov %rdx,%rcx
1c: 48 c1 e9 06 shr $0x6,%rcx
20: 74 39 je 0x5b
22: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
29: 00 00
2b:* 48 ff c9 dec %rcx <-- trapping instruction
2e: 48 89 07 mov %rax,(%rdi)
31: 48 89 47 08 mov %rax,0x8(%rdi)
35: 48 89 47 10 mov %rax,0x10(%rdi)
39: 48 89 47 18 mov %rax,0x18(%rdi)
3d: 48 89 47 00 mov %rax,0x0(%rdi)
Code starting with the faulting instruction
===========================================
0: 48 ff c9 dec %rcx
3: 48 89 07 mov %rax,(%rdi)
6: 48 89 47 08 mov %rax,0x8(%rdi)
a: 48 89 47 10 mov %rax,0x10(%rdi)
e: 48 89 47 18 mov %rax,0x18(%rdi)
12: 48 89 47 00 mov %rax,0x0(%rdi)
And from a different run:
[ 1369.560070] NMI watchdog: BUG: soft lockup - CPU#16 stuck for 22s! [trinity-c16:27469]
[ 1369.560070] Modules linked in:
[ 1369.560070] irq event stamp: 8701850
[ 1369.560070] hardirqs last enabled at (8701849): _raw_spin_unlock_irqrestore (./arch/x86/include/asm/paravirt.h:809 include/linux/spinlock_api_smp.h:162 kernel/locking/spinlock.c:191)
[ 1369.560070] hardirqs last disabled at (8701850): apic_timer_interrupt (arch/x86/kernel/entry_64.S:961)
[ 1369.560070] softirqs last enabled at (8693324): __do_softirq (./arch/x86/include/asm/preempt.h:22 kernel/softirq.c:300)
[ 1369.560070] softirqs last disabled at (8693321): irq_exit (kernel/softirq.c:350 kernel/softirq.c:391)
[ 1369.560070] CPU: 16 PID: 27469 Comm: trinity-c16 Not tainted 4.0.0-rc2-next-20150306-sasha-00035-g8286417 #2008
[ 1369.560070] task: ffff8804a3293000 ti: ffff8804b9538000 task.ti: ffff8804b9538000
[ 1369.560070] RIP: _raw_spin_unlock_irqrestore (./arch/x86/include/asm/paravirt.h:809 include/linux/spinlock_api_smp.h:162 kernel/locking/spinlock.c:191)
[ 1369.560070] RSP: 0000:ffff8804b953fba8 EFLAGS: 00000286
[ 1369.560070] RAX: dffffc0000000000 RBX: ffffffff9c0b5c39 RCX: ffff8804a3293000
[ 1369.560070] RDX: 1ffffffff4f155c9 RSI: 0000000000000001 RDI: 0000000000000286
[ 1369.560070] RBP: ffff8804b953fbb8 R08: 0000000000000000 R09: 0000000000000000
[ 1369.560070] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000086
[ 1369.560070] R13: ffff8804b953fb88 R14: 0000000000000000 R15: 0000000000000001
[ 1369.560070] FS: 00007f1d92980700(0000) GS:ffff8802f7200000(0000) knlGS:0000000000000000
[ 1369.560070] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1369.560070] CR2: 0000000000000000 CR3: 00000004b3430000 CR4: 00000000000007a0
[ 1369.560070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1369.560070] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 1369.560070] Stack:
[ 1369.560070] dffffc0000000000 ffff880631538000 ffff8804b953fcb8 ffffffff9c0b5d94
[ 1369.560070] ffffffffa7935440 ffffffffa8533040 ffff8804b953fc38 1ffff100972a7f85
[ 1369.560070] 0000000000000000 ffff8804a3293cd8 0000000000000001 ffff880631539000
[ 1369.560070] Call Trace:
[ 1369.560070] __debug_check_no_obj_freed (lib/debugobjects.c:713)
[ 1369.560070] ? __debug_object_init (lib/debugobjects.c:667)
[ 1369.560070] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2554 kernel/locking/lockdep.c:2601)
[ 1369.560070] debug_check_no_obj_freed (lib/debugobjects.c:727)
[ 1369.560070] free_pages_prepare (mm/page_alloc.c:816)
[ 1369.560070] ? free_hot_cold_page (./arch/x86/include/asm/paravirt.h:809 (discriminator 2) mm/page_alloc.c:1579 (discriminator 2))
[ 1369.560070] free_hot_cold_page (mm/page_alloc.c:1543)
[ 1369.560070] __free_pages (mm/page_alloc.c:2957)
[ 1369.560070] ? __vunmap (mm/vmalloc.c:1460 (discriminator 2))
[ 1369.560070] __vunmap (mm/vmalloc.c:1460 (discriminator 2))
[ 1369.560070] vfree (mm/vmalloc.c:1505)
[ 1369.560070] SyS_init_module (kernel/module.c:2503 kernel/module.c:3385)
[ 1369.560070] ? load_module (kernel/module.c:3385)
[ 1369.560070] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1604)
[ 1369.560070] ia32_do_call (arch/x86/ia32/ia32entry.S:446)
[ 1369.560070] Code: c7 48 ae 8a a7 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 6d 48 83 3d dc 5e c9 02 00 74 5a 48 89 df 57 9d <66> 66 90 66 90 65 ff 0d 59 fb 3f 5b 5b 41 5c 5d c3 0f 1f 40 00
All code
========
0: c7 (bad)
1: 48 ae rex.W scas %es:(%rdi),%al
3: 8a a7 48 b8 00 00 mov 0xb848(%rdi),%ah
9: 00 00 add %al,(%rax)
b: 00 fc add %bh,%ah
d: ff df lcallq *<internal disassembler error>
f: 48 89 fa mov %rdi,%rdx
12: 48 c1 ea 03 shr $0x3,%rdx
16: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1)
1a: 75 6d jne 0x89
1c: 48 83 3d dc 5e c9 02 cmpq $0x0,0x2c95edc(%rip) # 0x2c95f00
23: 00
24: 74 5a je 0x80
26: 48 89 df mov %rbx,%rdi
29: 57 push %rdi
2a: 9d popfq
2b:* 66 66 90 data32 xchg %ax,%ax <-- trapping instruction
2e: 66 90 xchg %ax,%ax
30: 65 ff 0d 59 fb 3f 5b decl %gs:0x5b3ffb59(%rip) # 0x5b3ffb90
37: 5b pop %rbx
38: 41 5c pop %r12
3a: 5d pop %rbp
3b: c3 retq
3c: 0f 1f 40 00 nopl 0x0(%rax)
...
Code starting with the faulting instruction
===========================================
0: 66 66 90 data32 xchg %ax,%ax
3: 66 90 xchg %ax,%ax
5: 65 ff 0d 59 fb 3f 5b decl %gs:0x5b3ffb59(%rip) # 0x5b3ffb65
c: 5b pop %rbx
d: 41 5c pop %r12
f: 5d pop %rbp
10: c3 retq
11: 0f 1f 40 00 nopl 0x0(%rax)
Thanks,
Sasha
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: mm: hangs in free_pages_prepare
2015-03-07 18:38 mm: hangs in free_pages_prepare Sasha Levin
@ 2015-03-08 20:38 ` Michal Hocko
2015-03-08 20:49 ` Sasha Levin
2015-03-17 8:58 ` Michal Hocko
0 siblings, 2 replies; 5+ messages in thread
From: Michal Hocko @ 2015-03-08 20:38 UTC (permalink / raw)
To: Sasha Levin; +Cc: linux-mm, akpm >> Andrew Morton
On Sat 07-03-15 13:38:08, Sasha Levin wrote:
[...]
> [ 1573.730097] ? kasan_free_pages (mm/kasan/kasan.c:301)
> [ 1573.788680] free_pages_prepare (mm/page_alloc.c:791)
> [ 1573.788680] ? free_hot_cold_page (./arch/x86/include/asm/paravirt.h:809 (discriminator 2) mm/page_alloc.c:1579 (discriminator 2))
> [ 1573.788680] free_hot_cold_page (mm/page_alloc.c:1543)
> [ 1573.788680] __free_pages (mm/page_alloc.c:2957)
> [ 1573.788680] ? __vunmap (mm/vmalloc.c:1460 (discriminator 2))
> [ 1573.788680] __vunmap (mm/vmalloc.c:1460 (discriminator 2))
__vunmap is doing:
for (i = 0; i < area->nr_pages; i++) {
struct page *page = area->pages[i];
BUG_ON(!page);
__free_page(page);
}
is it possible that nr_pages is a huge number (a large vmalloc area)? I
do not see any cond_resched down __free_page path at least. vfree
delayes the call to workqueue when called from irq context and vunmap is
marked as might_sleep). So to me it looks like it would be safe. Something
for vmalloc familiar people, though.
Anyway, the loop seems to be there since ages so I guess somebody just
started calling vmalloc for huge areas recently so it shown up.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: mm: hangs in free_pages_prepare
2015-03-08 20:38 ` Michal Hocko
@ 2015-03-08 20:49 ` Sasha Levin
2015-03-08 21:23 ` Michal Hocko
2015-03-17 8:58 ` Michal Hocko
1 sibling, 1 reply; 5+ messages in thread
From: Sasha Levin @ 2015-03-08 20:49 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, akpm >> Andrew Morton
On 03/08/2015 04:38 PM, Michal Hocko wrote:
> On Sat 07-03-15 13:38:08, Sasha Levin wrote:
> [...]
>> [ 1573.730097] ? kasan_free_pages (mm/kasan/kasan.c:301)
>> [ 1573.788680] free_pages_prepare (mm/page_alloc.c:791)
>> [ 1573.788680] ? free_hot_cold_page (./arch/x86/include/asm/paravirt.h:809 (discriminator 2) mm/page_alloc.c:1579 (discriminator 2))
>> [ 1573.788680] free_hot_cold_page (mm/page_alloc.c:1543)
>> [ 1573.788680] __free_pages (mm/page_alloc.c:2957)
>> [ 1573.788680] ? __vunmap (mm/vmalloc.c:1460 (discriminator 2))
>> [ 1573.788680] __vunmap (mm/vmalloc.c:1460 (discriminator 2))
>
> __vunmap is doing:
> for (i = 0; i < area->nr_pages; i++) {
> struct page *page = area->pages[i];
>
> BUG_ON(!page);
> __free_page(page);
> }
>
> is it possible that nr_pages is a huge number (a large vmalloc area)? I
> do not see any cond_resched down __free_page path at least. vfree
> delayes the call to workqueue when called from irq context and vunmap is
> marked as might_sleep). So to me it looks like it would be safe. Something
> for vmalloc familiar people, though.
>
> Anyway, the loop seems to be there since ages so I guess somebody just
> started calling vmalloc for huge areas recently so it shown up.
I might be missing something obvious here, but why does that loop exists at all?
Can't we just call __free_pages() instead?
Thanks,
Sasha
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: mm: hangs in free_pages_prepare
2015-03-08 20:49 ` Sasha Levin
@ 2015-03-08 21:23 ` Michal Hocko
0 siblings, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2015-03-08 21:23 UTC (permalink / raw)
To: Sasha Levin; +Cc: linux-mm, akpm >> Andrew Morton
On Sun 08-03-15 16:49:26, Sasha Levin wrote:
> On 03/08/2015 04:38 PM, Michal Hocko wrote:
> > On Sat 07-03-15 13:38:08, Sasha Levin wrote:
> > [...]
> >> [ 1573.730097] ? kasan_free_pages (mm/kasan/kasan.c:301)
> >> [ 1573.788680] free_pages_prepare (mm/page_alloc.c:791)
> >> [ 1573.788680] ? free_hot_cold_page (./arch/x86/include/asm/paravirt.h:809 (discriminator 2) mm/page_alloc.c:1579 (discriminator 2))
> >> [ 1573.788680] free_hot_cold_page (mm/page_alloc.c:1543)
> >> [ 1573.788680] __free_pages (mm/page_alloc.c:2957)
> >> [ 1573.788680] ? __vunmap (mm/vmalloc.c:1460 (discriminator 2))
> >> [ 1573.788680] __vunmap (mm/vmalloc.c:1460 (discriminator 2))
> >
> > __vunmap is doing:
> > for (i = 0; i < area->nr_pages; i++) {
> > struct page *page = area->pages[i];
> >
> > BUG_ON(!page);
> > __free_page(page);
> > }
> >
> > is it possible that nr_pages is a huge number (a large vmalloc area)? I
> > do not see any cond_resched down __free_page path at least. vfree
> > delayes the call to workqueue when called from irq context and vunmap is
> > marked as might_sleep). So to me it looks like it would be safe. Something
> > for vmalloc familiar people, though.
> >
> > Anyway, the loop seems to be there since ages so I guess somebody just
> > started calling vmalloc for huge areas recently so it shown up.
>
> I might be missing something obvious here, but why does that loop exists at all?
>
> Can't we just call __free_pages() instead?
This is vmalloc so the pages are not necessarily physically contiguous,
no?
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: mm: hangs in free_pages_prepare
2015-03-08 20:38 ` Michal Hocko
2015-03-08 20:49 ` Sasha Levin
@ 2015-03-17 8:58 ` Michal Hocko
1 sibling, 0 replies; 5+ messages in thread
From: Michal Hocko @ 2015-03-17 8:58 UTC (permalink / raw)
To: Sasha Levin; +Cc: linux-mm, akpm >> Andrew Morton
On Sun 08-03-15 16:38:38, Michal Hocko wrote:
> On Sat 07-03-15 13:38:08, Sasha Levin wrote:
> [...]
> > [ 1573.730097] ? kasan_free_pages (mm/kasan/kasan.c:301)
> > [ 1573.788680] free_pages_prepare (mm/page_alloc.c:791)
> > [ 1573.788680] ? free_hot_cold_page (./arch/x86/include/asm/paravirt.h:809 (discriminator 2) mm/page_alloc.c:1579 (discriminator 2))
> > [ 1573.788680] free_hot_cold_page (mm/page_alloc.c:1543)
> > [ 1573.788680] __free_pages (mm/page_alloc.c:2957)
> > [ 1573.788680] ? __vunmap (mm/vmalloc.c:1460 (discriminator 2))
> > [ 1573.788680] __vunmap (mm/vmalloc.c:1460 (discriminator 2))
>
> __vunmap is doing:
> for (i = 0; i < area->nr_pages; i++) {
> struct page *page = area->pages[i];
>
> BUG_ON(!page);
> __free_page(page);
> }
>
> is it possible that nr_pages is a huge number (a large vmalloc area)? I
> do not see any cond_resched down __free_page path at least. vfree
> delayes the call to workqueue when called from irq context and vunmap is
> marked as might_sleep). So to me it looks like it would be safe. Something
> for vmalloc familiar people, though.
Hmm, just looked into the git log and it seems that there are/were
some callers of vfree with spinlock held (e.g. 9265f1d0c759 (GFS2:
gfs2_dir_get_hash_table(): avoiding deferred vfree() is easy here...))
and who knows how many others like that we have so cond_resched here is
no-no.
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-03-17 8:58 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-07 18:38 mm: hangs in free_pages_prepare Sasha Levin
2015-03-08 20:38 ` Michal Hocko
2015-03-08 20:49 ` Sasha Levin
2015-03-08 21:23 ` Michal Hocko
2015-03-17 8:58 ` Michal Hocko
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.