* mm,console: circular dependency between console_sem and zone lock
@ 2014-05-08 2:03 Sasha Levin
2014-05-12 16:28 ` Jan Kara
0 siblings, 1 reply; 6+ messages in thread
From: Sasha Levin @ 2014-05-08 2:03 UTC (permalink / raw)
To: linux-mm; +Cc: Andrew Morton, LKML, Dave Jones, Jan Kara, Steven Rostedt
Hi all,
While fuzzing with trinity inside a KVM tools guest running the latest -next
kernel I've stumbled on the following spew:
[ 262.793172] ======================================================
[ 262.794555] [ INFO: possible circular locking dependency detected ]
[ 262.796110] 3.15.0-rc4-next-20140507-sasha-00004-g14be78b-dirty #448 Tainted: G W
[ 262.798430] -------------------------------------------------------
[ 262.799804] runtrin.sh/9791 is trying to acquire lock:
[ 262.801168] ((console_sem).lock){-.-...}, at: down_trylock (kernel/locking/semaphore.c:137)
[ 262.801216]
[ 262.801216] but task is already holding lock:
[ 262.801216] (&(&zone->lock)->rlock){-.-...}, at: __offline_isolated_pages (mm/page_alloc.c:6427)
[ 262.801216]
[ 262.801216] which lock already depends on the new lock.
[ 262.801216]
[ 262.801216]
[ 262.801216] the existing dependency chain (in reverse order) is:
[ 262.801216]
-> #3 (&(&zone->lock)->rlock){-.-...}:
[ 262.801216] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 262.801216] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
[ 262.801216] get_page_from_freelist (mm/page_alloc.c:1574 mm/page_alloc.c:2033)
[ 262.801216] __alloc_pages_nodemask (mm/page_alloc.c:2728)
[ 262.801216] alloc_page_interleave (mm/mempolicy.c:1944)
[ 262.801216] alloc_pages_current (mm/mempolicy.c:2041)
[ 262.801216] new_slab (include/linux/gfp.h:337 mm/slub.c:1327 mm/slub.c:1356 mm/slub.c:1418)
[ 262.801216] __slab_alloc (mm/slub.c:2204 mm/slub.c:2364)
[ 262.801216] kmem_cache_alloc (mm/slub.c:2470 mm/slub.c:2481 mm/slub.c:2486)
[ 262.801216] __debug_object_init (lib/debugobjects.c:97 lib/debugobjects.c:311)
[ 262.801216] debug_object_init (lib/debugobjects.c:364)
[ 262.801216] hrtimer_init (kernel/hrtimer.c:437 include/linux/jump_label.h:105 include/trace/events/timer.h:130 kernel/hrtimer.c:482 kernel/hrtimer.c:1222)
[ 262.801216] __sched_fork (kernel/sched/core.c:1745)
[ 262.801216] init_idle (kernel/sched/core.c:4460)
[ 262.801216] fork_idle (kernel/fork.c:1565)
[ 262.801216] idle_threads_init (kernel/smpboot.c:54 kernel/smpboot.c:72)
[ 262.801216] smp_init (kernel/smp.c:535)
[ 262.801216] kernel_init_freeable (init/main.c:854 init/main.c:1007)
[ 262.801216] kernel_init (init/main.c:939)
[ 262.801216] ret_from_fork (arch/x86/kernel/entry_64.S:553)
[ 262.801216]
-> #2 (&rq->lock){-.-.-.}:
[ 262.801216] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 262.801216] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 262.801216] wake_up_new_task (include/linux/sched.h:2873 kernel/sched/core.c:329 kernel/sched/core.c:2027)
[ 262.801216] do_fork (kernel/fork.c:1628)
[ 262.801216] kernel_thread (kernel/fork.c:1650)
[ 262.801216] rest_init (init/main.c:404)
[ 262.801216] start_kernel (init/main.c:683)
[ 262.801216] x86_64_start_reservations (arch/x86/kernel/head64.c:194)
[ 262.801216] x86_64_start_kernel (arch/x86/kernel/head64.c:183)
[ 262.801216]
-> #1 (&p->pi_lock){-.-.-.}:
[ 262.801216] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 262.801216] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
[ 262.801216] try_to_wake_up (kernel/sched/core.c:1605)
[ 262.801216] wake_up_process (kernel/sched/core.c:1701 (discriminator 2))
[ 262.801216] __up.isra.0 (kernel/locking/semaphore.c:263)
[ 262.801216] up (kernel/locking/semaphore.c:186)
[ 262.801216] console_unlock (kernel/printk/printk.c:2230)
[ 262.801216] vprintk_emit (kernel/printk/printk.c:1746)
[ 262.801216] dev_vprintk_emit (drivers/base/core.c:2053 (discriminator 3))
[ 262.801216] dev_printk_emit (drivers/base/core.c:2068)
[ 262.801216] __dynamic_dev_dbg (lib/dynamic_debug.c:593)
[ 262.801216] pps_event (drivers/pps/kapi.c:204 (discriminator 1))
[ 262.801216] pps_ktimer_event (drivers/pps/clients/pps-ktimer.c:51)
[ 262.801216] call_timer_fn (kernel/timer.c:1140 include/linux/jump_label.h:105 include/trace/events/timer.h:106 kernel/timer.c:1141)
[ 262.801216] run_timer_softirq (include/linux/spinlock.h:328 kernel/timer.c:1213 kernel/timer.c:1403)
[ 262.801216] __do_softirq (kernel/softirq.c:269 include/linux/jump_label.h:105 include/trace/events/irq.h:126 kernel/softirq.c:270)
[ 262.801216] irq_exit (kernel/softirq.c:346 kernel/softirq.c:387)
[ 262.801216] smp_apic_timer_interrupt (arch/x86/include/asm/irq_regs.h:26 arch/x86/kernel/apic/apic.c:947)
[ 262.801216] apic_timer_interrupt (arch/x86/kernel/entry_64.S:1225)
[ 262.801216] default_idle (arch/x86/include/asm/paravirt.h:111 arch/x86/kernel/process.c:310)
[ 262.801216] arch_cpu_idle (arch/x86/kernel/process.c:302)
[ 262.801216] cpu_idle_loop (kernel/sched/idle.c:173 kernel/sched/idle.c:220)
[ 262.801216] cpu_startup_entry (??:?)
[ 262.801216] start_secondary (arch/x86/kernel/smpboot.c:274)
[ 262.801216]
-> #0 ((console_sem).lock){-.-...}:
[ 262.801216] __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
[ 262.801216] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 262.801216] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
[ 262.801216] down_trylock (kernel/locking/semaphore.c:137)
[ 262.801216] __down_trylock_console_sem (kernel/printk/printk.c:108)
[ 262.801216] console_trylock (kernel/printk/printk.c:2070)
[ 262.801216] vprintk_emit (kernel/printk/printk.c:1445 kernel/printk/printk.c:1745)
[ 262.801216] printk (kernel/printk/printk.c:1813)
[ 262.801216] __offline_isolated_pages (mm/page_alloc.c:6450)
[ 262.801216] offline_isolated_pages_cb (mm/memory_hotplug.c:1415)
[ 262.801216] walk_system_ram_range (kernel/resource.c:389)
[ 262.801216] __offline_pages.constprop.22 (mm/memory_hotplug.c:1710)
[ 262.801216] offline_pages (mm/memory_hotplug.c:1756)
[ 262.801216] memory_subsys_offline (drivers/base/memory.c:267 drivers/base/memory.c:304)
[ 262.801216] device_offline (drivers/base/core.c:1429)
[ 262.801216] online_store (drivers/base/core.c:451 (discriminator 2))
[ 262.801216] dev_attr_store (drivers/base/core.c:138)
[ 262.801216] sysfs_kf_write (fs/sysfs/file.c:114)
[ 262.801216] kernfs_fop_write (fs/kernfs/file.c:295)
[ 262.801216] vfs_write (fs/read_write.c:532)
[ 262.801216] SyS_write (fs/read_write.c:584 fs/read_write.c:576)
[ 262.801216] tracesys (arch/x86/kernel/entry_64.S:746)
[ 262.801216]
[ 262.801216] other info that might help us debug this:
[ 262.801216]
[ 262.801216] Chain exists of:
(console_sem).lock --> &rq->lock --> &(&zone->lock)->rlock
[ 262.801216] Possible unsafe locking scenario:
[ 262.801216]
[ 262.801216] CPU0 CPU1
[ 262.801216] ---- ----
[ 262.801216] lock(&(&zone->lock)->rlock);
[ 262.801216] lock(&rq->lock);
[ 262.801216] lock(&(&zone->lock)->rlock);
[ 262.801216] lock((console_sem).lock);
[ 262.801216]
[ 262.801216] *** DEADLOCK ***
[ 262.801216]
[ 262.801216] 8 locks held by runtrin.sh/9791:
[ 262.801216] #0: (sb_writers#4){.+.+.+}, at: vfs_write (include/linux/fs.h:2252 fs/read_write.c:530)
[ 262.801216] #1: (&of->mutex){+.+.+.}, at: kernfs_fop_write (fs/kernfs/file.c:283)
[ 262.801216] #2: (s_active#28){.+.+.+}, at: kernfs_fop_write (fs/kernfs/file.c:283)
[ 262.801216] #3: (device_hotplug_lock){+.+.+.}, at: lock_device_hotplug_sysfs (drivers/base/core.c:67)
[ 262.801216] #4: (&dev->mutex){......}, at: device_offline (drivers/base/core.c:2128 drivers/base/core.c:1423)
[ 262.801216] #5: (mem_hotplug.lock){++++++}, at: mem_hotplug_begin (mm/memory_hotplug.c:107)
[ 262.801216] #6: (mem_hotplug.lock#2){+.+.+.}, at: mem_hotplug_begin (mm/memory_hotplug.c:113)
[ 262.801216] #7: (&(&zone->lock)->rlock){-.-...}, at: __offline_isolated_pages (mm/page_alloc.c:6427)
[ 262.801216]
[ 262.801216] stack backtrace:
[ 262.801216] CPU: 6 PID: 9791 Comm: runtrin.sh Tainted: G W 3.15.0-rc4-next-20140507-sasha-00004-g14be78b-dirty #448
[ 262.801216] ffffffff90d73c70 ffff8802c6c158c8 ffffffff8d539450 0000000000000004
[ 262.801216] ffffffff90d6fe70 ffff8802c6c15918 ffffffff8d52bbe0 0000000000000008
[ 262.801216] ffff8802c6c159a8 ffff8802c6c15918 ffff8802c879be78 ffff8802c879beb0
[ 262.801216] Call Trace:
[ 262.801216] dump_stack (lib/dump_stack.c:52)
[ 262.801216] print_circular_bug (kernel/locking/lockdep.c:1216)
[ 262.801216] __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
[ 262.801216] ? kvm_clock_read (arch/x86/include/asm/preempt.h:90 arch/x86/kernel/kvmclock.c:86)
[ 262.801216] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 262.801216] ? down_trylock (kernel/locking/semaphore.c:137)
[ 262.801216] ? _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:109 kernel/locking/spinlock.c:159)
[ 262.801216] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
[ 262.801216] ? down_trylock (kernel/locking/semaphore.c:137)
[ 262.801216] down_trylock (kernel/locking/semaphore.c:137)
[ 262.801216] ? vprintk_emit (kernel/printk/printk.c:1445 kernel/printk/printk.c:1745)
[ 262.801216] __down_trylock_console_sem (kernel/printk/printk.c:108)
[ 262.801216] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 262.801216] console_trylock (kernel/printk/printk.c:2070)
[ 262.801216] vprintk_emit (kernel/printk/printk.c:1445 kernel/printk/printk.c:1745)
[ 262.801216] printk (kernel/printk/printk.c:1813)
[ 262.801216] ? __offline_isolated_pages (mm/page_alloc.c:6427)
[ 262.801216] __offline_isolated_pages (mm/page_alloc.c:6450)
[ 262.801216] offline_isolated_pages_cb (mm/memory_hotplug.c:1415)
[ 262.801216] walk_system_ram_range (kernel/resource.c:389)
[ 262.801216] ? check_pages_isolated_cb (mm/memory_hotplug.c:1412)
[ 262.801216] __offline_pages.constprop.22 (mm/memory_hotplug.c:1710)
[ 262.801216] ? mutex_lock_nested (arch/x86/include/asm/paravirt.h:809 kernel/locking/mutex.c:569 kernel/locking/mutex.c:587)
[ 262.801216] ? mutex_lock_nested (arch/x86/include/asm/preempt.h:98 kernel/locking/mutex.c:570 kernel/locking/mutex.c:587)
[ 262.801216] ? device_offline (drivers/base/core.c:2128 drivers/base/core.c:1423)
[ 262.801216] offline_pages (mm/memory_hotplug.c:1756)
[ 262.801216] memory_subsys_offline (drivers/base/memory.c:267 drivers/base/memory.c:304)
[ 262.801216] device_offline (drivers/base/core.c:1429)
[ 262.801216] online_store (drivers/base/core.c:451 (discriminator 2))
[ 262.801216] dev_attr_store (drivers/base/core.c:138)
[ 262.801216] sysfs_kf_write (fs/sysfs/file.c:114)
[ 262.801216] kernfs_fop_write (fs/kernfs/file.c:295)
[ 262.801216] vfs_write (fs/read_write.c:532)
[ 262.801216] SyS_write (fs/read_write.c:584 fs/read_write.c:576)
[ 262.801216] tracesys (arch/x86/kernel/entry_64.S:746)
Thanks,
Sasha
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mm,console: circular dependency between console_sem and zone lock
2014-05-08 2:03 mm,console: circular dependency between console_sem and zone lock Sasha Levin
@ 2014-05-12 16:28 ` Jan Kara
2014-06-01 14:08 ` Sasha Levin
0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2014-05-12 16:28 UTC (permalink / raw)
To: Sasha Levin
Cc: linux-mm, Andrew Morton, LKML, Dave Jones, Jan Kara, Steven Rostedt
On Wed 07-05-14 22:03:08, Sasha Levin wrote:
> Hi all,
>
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel I've stumbled on the following spew:
Thanks for report. So the problem seems to be maginally valid but I'm not
100% sure whom to blame :). So printk() code calls up() which calls
try_to_wake_up() under console_sem.lock spinlock. That function can take
rq->lock which is all expected.
The next part of the chain is that during CPU initialization we call
__sched_fork() with rq->lock which calls into hrtimer_init() which can
allocate memory which creates a dependency rq->lock => zone.lock.rlock.
And memory management code calls printk() which zone.lock.rlock held which
closes the loop. Now I suspect the second link in the chain can happen only
while CPU is booting and might even happen only if some debug options are
enabled. But I don't really know scheduler code well enough. Steven?
Honza
> [ 262.793172] ======================================================
> [ 262.794555] [ INFO: possible circular locking dependency detected ]
> [ 262.796110] 3.15.0-rc4-next-20140507-sasha-00004-g14be78b-dirty #448 Tainted: G W
> [ 262.798430] -------------------------------------------------------
> [ 262.799804] runtrin.sh/9791 is trying to acquire lock:
> [ 262.801168] ((console_sem).lock){-.-...}, at: down_trylock (kernel/locking/semaphore.c:137)
> [ 262.801216]
> [ 262.801216] but task is already holding lock:
> [ 262.801216] (&(&zone->lock)->rlock){-.-...}, at: __offline_isolated_pages (mm/page_alloc.c:6427)
> [ 262.801216]
> [ 262.801216] which lock already depends on the new lock.
> [ 262.801216]
> [ 262.801216]
> [ 262.801216] the existing dependency chain (in reverse order) is:
> [ 262.801216]
> -> #3 (&(&zone->lock)->rlock){-.-...}:
> [ 262.801216] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
> [ 262.801216] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
> [ 262.801216] get_page_from_freelist (mm/page_alloc.c:1574 mm/page_alloc.c:2033)
> [ 262.801216] __alloc_pages_nodemask (mm/page_alloc.c:2728)
> [ 262.801216] alloc_page_interleave (mm/mempolicy.c:1944)
> [ 262.801216] alloc_pages_current (mm/mempolicy.c:2041)
> [ 262.801216] new_slab (include/linux/gfp.h:337 mm/slub.c:1327 mm/slub.c:1356 mm/slub.c:1418)
> [ 262.801216] __slab_alloc (mm/slub.c:2204 mm/slub.c:2364)
> [ 262.801216] kmem_cache_alloc (mm/slub.c:2470 mm/slub.c:2481 mm/slub.c:2486)
> [ 262.801216] __debug_object_init (lib/debugobjects.c:97 lib/debugobjects.c:311)
> [ 262.801216] debug_object_init (lib/debugobjects.c:364)
> [ 262.801216] hrtimer_init (kernel/hrtimer.c:437 include/linux/jump_label.h:105 include/trace/events/timer.h:130 kernel/hrtimer.c:482 kernel/hrtimer.c:1222)
> [ 262.801216] __sched_fork (kernel/sched/core.c:1745)
> [ 262.801216] init_idle (kernel/sched/core.c:4460)
> [ 262.801216] fork_idle (kernel/fork.c:1565)
> [ 262.801216] idle_threads_init (kernel/smpboot.c:54 kernel/smpboot.c:72)
> [ 262.801216] smp_init (kernel/smp.c:535)
> [ 262.801216] kernel_init_freeable (init/main.c:854 init/main.c:1007)
> [ 262.801216] kernel_init (init/main.c:939)
> [ 262.801216] ret_from_fork (arch/x86/kernel/entry_64.S:553)
> [ 262.801216]
> -> #2 (&rq->lock){-.-.-.}:
> [ 262.801216] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
> [ 262.801216] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
> [ 262.801216] wake_up_new_task (include/linux/sched.h:2873 kernel/sched/core.c:329 kernel/sched/core.c:2027)
> [ 262.801216] do_fork (kernel/fork.c:1628)
> [ 262.801216] kernel_thread (kernel/fork.c:1650)
> [ 262.801216] rest_init (init/main.c:404)
> [ 262.801216] start_kernel (init/main.c:683)
> [ 262.801216] x86_64_start_reservations (arch/x86/kernel/head64.c:194)
> [ 262.801216] x86_64_start_kernel (arch/x86/kernel/head64.c:183)
> [ 262.801216]
> -> #1 (&p->pi_lock){-.-.-.}:
> [ 262.801216] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
> [ 262.801216] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
> [ 262.801216] try_to_wake_up (kernel/sched/core.c:1605)
> [ 262.801216] wake_up_process (kernel/sched/core.c:1701 (discriminator 2))
> [ 262.801216] __up.isra.0 (kernel/locking/semaphore.c:263)
> [ 262.801216] up (kernel/locking/semaphore.c:186)
> [ 262.801216] console_unlock (kernel/printk/printk.c:2230)
> [ 262.801216] vprintk_emit (kernel/printk/printk.c:1746)
> [ 262.801216] dev_vprintk_emit (drivers/base/core.c:2053 (discriminator 3))
> [ 262.801216] dev_printk_emit (drivers/base/core.c:2068)
> [ 262.801216] __dynamic_dev_dbg (lib/dynamic_debug.c:593)
> [ 262.801216] pps_event (drivers/pps/kapi.c:204 (discriminator 1))
> [ 262.801216] pps_ktimer_event (drivers/pps/clients/pps-ktimer.c:51)
> [ 262.801216] call_timer_fn (kernel/timer.c:1140 include/linux/jump_label.h:105 include/trace/events/timer.h:106 kernel/timer.c:1141)
> [ 262.801216] run_timer_softirq (include/linux/spinlock.h:328 kernel/timer.c:1213 kernel/timer.c:1403)
> [ 262.801216] __do_softirq (kernel/softirq.c:269 include/linux/jump_label.h:105 include/trace/events/irq.h:126 kernel/softirq.c:270)
> [ 262.801216] irq_exit (kernel/softirq.c:346 kernel/softirq.c:387)
> [ 262.801216] smp_apic_timer_interrupt (arch/x86/include/asm/irq_regs.h:26 arch/x86/kernel/apic/apic.c:947)
> [ 262.801216] apic_timer_interrupt (arch/x86/kernel/entry_64.S:1225)
> [ 262.801216] default_idle (arch/x86/include/asm/paravirt.h:111 arch/x86/kernel/process.c:310)
> [ 262.801216] arch_cpu_idle (arch/x86/kernel/process.c:302)
> [ 262.801216] cpu_idle_loop (kernel/sched/idle.c:173 kernel/sched/idle.c:220)
> [ 262.801216] cpu_startup_entry (??:?)
> [ 262.801216] start_secondary (arch/x86/kernel/smpboot.c:274)
> [ 262.801216]
> -> #0 ((console_sem).lock){-.-...}:
> [ 262.801216] __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
> [ 262.801216] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
> [ 262.801216] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
> [ 262.801216] down_trylock (kernel/locking/semaphore.c:137)
> [ 262.801216] __down_trylock_console_sem (kernel/printk/printk.c:108)
> [ 262.801216] console_trylock (kernel/printk/printk.c:2070)
> [ 262.801216] vprintk_emit (kernel/printk/printk.c:1445 kernel/printk/printk.c:1745)
> [ 262.801216] printk (kernel/printk/printk.c:1813)
> [ 262.801216] __offline_isolated_pages (mm/page_alloc.c:6450)
> [ 262.801216] offline_isolated_pages_cb (mm/memory_hotplug.c:1415)
> [ 262.801216] walk_system_ram_range (kernel/resource.c:389)
> [ 262.801216] __offline_pages.constprop.22 (mm/memory_hotplug.c:1710)
> [ 262.801216] offline_pages (mm/memory_hotplug.c:1756)
> [ 262.801216] memory_subsys_offline (drivers/base/memory.c:267 drivers/base/memory.c:304)
> [ 262.801216] device_offline (drivers/base/core.c:1429)
> [ 262.801216] online_store (drivers/base/core.c:451 (discriminator 2))
> [ 262.801216] dev_attr_store (drivers/base/core.c:138)
> [ 262.801216] sysfs_kf_write (fs/sysfs/file.c:114)
> [ 262.801216] kernfs_fop_write (fs/kernfs/file.c:295)
> [ 262.801216] vfs_write (fs/read_write.c:532)
> [ 262.801216] SyS_write (fs/read_write.c:584 fs/read_write.c:576)
> [ 262.801216] tracesys (arch/x86/kernel/entry_64.S:746)
> [ 262.801216]
> [ 262.801216] other info that might help us debug this:
> [ 262.801216]
> [ 262.801216] Chain exists of:
> (console_sem).lock --> &rq->lock --> &(&zone->lock)->rlock
>
> [ 262.801216] Possible unsafe locking scenario:
> [ 262.801216]
> [ 262.801216] CPU0 CPU1
> [ 262.801216] ---- ----
> [ 262.801216] lock(&(&zone->lock)->rlock);
> [ 262.801216] lock(&rq->lock);
> [ 262.801216] lock(&(&zone->lock)->rlock);
> [ 262.801216] lock((console_sem).lock);
> [ 262.801216]
> [ 262.801216] *** DEADLOCK ***
> [ 262.801216]
> [ 262.801216] 8 locks held by runtrin.sh/9791:
> [ 262.801216] #0: (sb_writers#4){.+.+.+}, at: vfs_write (include/linux/fs.h:2252 fs/read_write.c:530)
> [ 262.801216] #1: (&of->mutex){+.+.+.}, at: kernfs_fop_write (fs/kernfs/file.c:283)
> [ 262.801216] #2: (s_active#28){.+.+.+}, at: kernfs_fop_write (fs/kernfs/file.c:283)
> [ 262.801216] #3: (device_hotplug_lock){+.+.+.}, at: lock_device_hotplug_sysfs (drivers/base/core.c:67)
> [ 262.801216] #4: (&dev->mutex){......}, at: device_offline (drivers/base/core.c:2128 drivers/base/core.c:1423)
> [ 262.801216] #5: (mem_hotplug.lock){++++++}, at: mem_hotplug_begin (mm/memory_hotplug.c:107)
> [ 262.801216] #6: (mem_hotplug.lock#2){+.+.+.}, at: mem_hotplug_begin (mm/memory_hotplug.c:113)
> [ 262.801216] #7: (&(&zone->lock)->rlock){-.-...}, at: __offline_isolated_pages (mm/page_alloc.c:6427)
> [ 262.801216]
> [ 262.801216] stack backtrace:
> [ 262.801216] CPU: 6 PID: 9791 Comm: runtrin.sh Tainted: G W 3.15.0-rc4-next-20140507-sasha-00004-g14be78b-dirty #448
> [ 262.801216] ffffffff90d73c70 ffff8802c6c158c8 ffffffff8d539450 0000000000000004
> [ 262.801216] ffffffff90d6fe70 ffff8802c6c15918 ffffffff8d52bbe0 0000000000000008
> [ 262.801216] ffff8802c6c159a8 ffff8802c6c15918 ffff8802c879be78 ffff8802c879beb0
> [ 262.801216] Call Trace:
> [ 262.801216] dump_stack (lib/dump_stack.c:52)
> [ 262.801216] print_circular_bug (kernel/locking/lockdep.c:1216)
> [ 262.801216] __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
> [ 262.801216] ? kvm_clock_read (arch/x86/include/asm/preempt.h:90 arch/x86/kernel/kvmclock.c:86)
> [ 262.801216] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
> [ 262.801216] ? down_trylock (kernel/locking/semaphore.c:137)
> [ 262.801216] ? _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:109 kernel/locking/spinlock.c:159)
> [ 262.801216] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
> [ 262.801216] ? down_trylock (kernel/locking/semaphore.c:137)
> [ 262.801216] down_trylock (kernel/locking/semaphore.c:137)
> [ 262.801216] ? vprintk_emit (kernel/printk/printk.c:1445 kernel/printk/printk.c:1745)
> [ 262.801216] __down_trylock_console_sem (kernel/printk/printk.c:108)
> [ 262.801216] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> [ 262.801216] console_trylock (kernel/printk/printk.c:2070)
> [ 262.801216] vprintk_emit (kernel/printk/printk.c:1445 kernel/printk/printk.c:1745)
> [ 262.801216] printk (kernel/printk/printk.c:1813)
> [ 262.801216] ? __offline_isolated_pages (mm/page_alloc.c:6427)
> [ 262.801216] __offline_isolated_pages (mm/page_alloc.c:6450)
> [ 262.801216] offline_isolated_pages_cb (mm/memory_hotplug.c:1415)
> [ 262.801216] walk_system_ram_range (kernel/resource.c:389)
> [ 262.801216] ? check_pages_isolated_cb (mm/memory_hotplug.c:1412)
> [ 262.801216] __offline_pages.constprop.22 (mm/memory_hotplug.c:1710)
> [ 262.801216] ? mutex_lock_nested (arch/x86/include/asm/paravirt.h:809 kernel/locking/mutex.c:569 kernel/locking/mutex.c:587)
> [ 262.801216] ? mutex_lock_nested (arch/x86/include/asm/preempt.h:98 kernel/locking/mutex.c:570 kernel/locking/mutex.c:587)
> [ 262.801216] ? device_offline (drivers/base/core.c:2128 drivers/base/core.c:1423)
> [ 262.801216] offline_pages (mm/memory_hotplug.c:1756)
> [ 262.801216] memory_subsys_offline (drivers/base/memory.c:267 drivers/base/memory.c:304)
> [ 262.801216] device_offline (drivers/base/core.c:1429)
> [ 262.801216] online_store (drivers/base/core.c:451 (discriminator 2))
> [ 262.801216] dev_attr_store (drivers/base/core.c:138)
> [ 262.801216] sysfs_kf_write (fs/sysfs/file.c:114)
> [ 262.801216] kernfs_fop_write (fs/kernfs/file.c:295)
> [ 262.801216] vfs_write (fs/read_write.c:532)
> [ 262.801216] SyS_write (fs/read_write.c:584 fs/read_write.c:576)
> [ 262.801216] tracesys (arch/x86/kernel/entry_64.S:746)
>
>
> Thanks,
> Sasha
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mm,console: circular dependency between console_sem and zone lock
2014-05-12 16:28 ` Jan Kara
@ 2014-06-01 14:08 ` Sasha Levin
2014-06-03 1:55 ` Steven Rostedt
2014-06-03 9:00 ` Peter Zijlstra
0 siblings, 2 replies; 6+ messages in thread
From: Sasha Levin @ 2014-06-01 14:08 UTC (permalink / raw)
To: Jan Kara
Cc: linux-mm, Andrew Morton, LKML, Dave Jones, Steven Rostedt,
Peter Zijlstra, Ingo Molnar
On 05/12/2014 12:28 PM, Jan Kara wrote:
> On Wed 07-05-14 22:03:08, Sasha Levin wrote:
>> > Hi all,
>> >
>> > While fuzzing with trinity inside a KVM tools guest running the latest -next
>> > kernel I've stumbled on the following spew:
> Thanks for report. So the problem seems to be maginally valid but I'm not
> 100% sure whom to blame :). So printk() code calls up() which calls
> try_to_wake_up() under console_sem.lock spinlock. That function can take
> rq->lock which is all expected.
>
> The next part of the chain is that during CPU initialization we call
> __sched_fork() with rq->lock which calls into hrtimer_init() which can
> allocate memory which creates a dependency rq->lock => zone.lock.rlock.
>
> And memory management code calls printk() which zone.lock.rlock held which
> closes the loop. Now I suspect the second link in the chain can happen only
> while CPU is booting and might even happen only if some debug options are
> enabled. But I don't really know scheduler code well enough. Steven?
I've cc'ed Peter and Ingo who may be able to answer that, as it still happens
on -next.
Thanks,
Sasha
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mm,console: circular dependency between console_sem and zone lock
2014-06-01 14:08 ` Sasha Levin
@ 2014-06-03 1:55 ` Steven Rostedt
2014-06-03 8:47 ` Peter Zijlstra
2014-06-03 9:00 ` Peter Zijlstra
1 sibling, 1 reply; 6+ messages in thread
From: Steven Rostedt @ 2014-06-03 1:55 UTC (permalink / raw)
To: Sasha Levin
Cc: Jan Kara, linux-mm, Andrew Morton, LKML, Dave Jones,
Peter Zijlstra, Ingo Molnar
On Sun, 01 Jun 2014 10:08:21 -0400
Sasha Levin <sasha.levin@oracle.com> wrote:
> On 05/12/2014 12:28 PM, Jan Kara wrote:
> > On Wed 07-05-14 22:03:08, Sasha Levin wrote:
> >> > Hi all,
> >> >
> >> > While fuzzing with trinity inside a KVM tools guest running the latest -next
> >> > kernel I've stumbled on the following spew:
> > Thanks for report. So the problem seems to be maginally valid but I'm not
> > 100% sure whom to blame :). So printk() code calls up() which calls
> > try_to_wake_up() under console_sem.lock spinlock. That function can take
> > rq->lock which is all expected.
> >
> > The next part of the chain is that during CPU initialization we call
> > __sched_fork() with rq->lock which calls into hrtimer_init() which can
> > allocate memory which creates a dependency rq->lock => zone.lock.rlock.
> >
> > And memory management code calls printk() which zone.lock.rlock held which
> > closes the loop. Now I suspect the second link in the chain can happen only
> > while CPU is booting and might even happen only if some debug options are
> > enabled. But I don't really know scheduler code well enough. Steven?
>
> I've cc'ed Peter and Ingo who may be able to answer that, as it still happens
> on -next.
>
Hmm, it failed on a try lock, but on the spinlock within the trylock. I
wonder if we should add this.
Peter?
-- Steve
diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index 6815171..6579f84 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -132,7 +132,9 @@ int down_trylock(struct semaphore *sem)
unsigned long flags;
int count;
- raw_spin_lock_irqsave(&sem->lock, flags);
+ if (!raw_spin_trylock_irqsave(&sem->lock, flags))
+ return 1;
+
count = sem->count - 1;
if (likely(count >= 0))
sem->count = count;
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: mm,console: circular dependency between console_sem and zone lock
2014-06-03 1:55 ` Steven Rostedt
@ 2014-06-03 8:47 ` Peter Zijlstra
0 siblings, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2014-06-03 8:47 UTC (permalink / raw)
To: Steven Rostedt
Cc: Sasha Levin, Jan Kara, linux-mm, Andrew Morton, LKML, Dave Jones,
Ingo Molnar
[-- Attachment #1: Type: text/plain, Size: 895 bytes --]
On Mon, Jun 02, 2014 at 09:55:29PM -0400, Steven Rostedt wrote:
> Hmm, it failed on a try lock, but on the spinlock within the trylock. I
> wonder if we should add this.
>
> Peter?
>
> -- Steve
>
> diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
> index 6815171..6579f84 100644
> --- a/kernel/locking/semaphore.c
> +++ b/kernel/locking/semaphore.c
> @@ -132,7 +132,9 @@ int down_trylock(struct semaphore *sem)
> unsigned long flags;
> int count;
>
> - raw_spin_lock_irqsave(&sem->lock, flags);
> + if (!raw_spin_trylock_irqsave(&sem->lock, flags))
> + return 1;
> +
> count = sem->count - 1;
> if (likely(count >= 0))
> sem->count = count;
I prefer not to, there is no reason the down_trylock() will fail if that
spinlock is contended. We might have just hit the release in progress.
But let me stare at the original problem..
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: mm,console: circular dependency between console_sem and zone lock
2014-06-01 14:08 ` Sasha Levin
2014-06-03 1:55 ` Steven Rostedt
@ 2014-06-03 9:00 ` Peter Zijlstra
1 sibling, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2014-06-03 9:00 UTC (permalink / raw)
To: Sasha Levin
Cc: Jan Kara, linux-mm, Andrew Morton, LKML, Dave Jones,
Steven Rostedt, Ingo Molnar
[-- Attachment #1: Type: text/plain, Size: 2471 bytes --]
On Sun, Jun 01, 2014 at 10:08:21AM -0400, Sasha Levin wrote:
> On 05/12/2014 12:28 PM, Jan Kara wrote:
> > On Wed 07-05-14 22:03:08, Sasha Levin wrote:
> >> > Hi all,
> >> >
> >> > While fuzzing with trinity inside a KVM tools guest running the latest -next
> >> > kernel I've stumbled on the following spew:
> > Thanks for report. So the problem seems to be maginally valid but I'm not
> > 100% sure whom to blame :). So printk() code calls up() which calls
> > try_to_wake_up() under console_sem.lock spinlock. That function can take
> > rq->lock which is all expected.
> >
> > The next part of the chain is that during CPU initialization we call
> > __sched_fork() with rq->lock which calls into hrtimer_init() which can
> > allocate memory which creates a dependency rq->lock => zone.lock.rlock.
> >
> > And memory management code calls printk() which zone.lock.rlock held which
> > closes the loop. Now I suspect the second link in the chain can happen only
> > while CPU is booting and might even happen only if some debug options are
> > enabled. But I don't really know scheduler code well enough. Steven?
>
> I've cc'ed Peter and Ingo who may be able to answer that, as it still happens
> on -next.
Ah, cute.
So the second paragraph seems to miss the detail that this is the
__sched_fork() call from init_idle(), all other callers don't actually
hold the rq->lock.
Now init_idle() is called from:
sched_init()
fork_idle()
idle_thread_get()
Now fork_idle() is called from:
smp_init() -> idle_threads_init() -> idle_init()
and idle_thread_get is called from:
_cpu_up()
So while it looks we're calling __sched_fork() twice for every !boot
idle thread (urgh) we do appear to call it before anything is running on
that cpu, so I don't see any particular problem with removing the call
from under that lock.
Something like so should do I suppose.
---
kernel/sched/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 240aa83e73f5..99609c33482b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4505,9 +4505,10 @@ void init_idle(struct task_struct *idle, int cpu)
struct rq *rq = cpu_rq(cpu);
unsigned long flags;
+ __sched_fork(0, idle);
+
raw_spin_lock_irqsave(&rq->lock, flags);
- __sched_fork(0, idle);
idle->state = TASK_RUNNING;
idle->se.exec_start = sched_clock();
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-06-03 9:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-08 2:03 mm,console: circular dependency between console_sem and zone lock Sasha Levin
2014-05-12 16:28 ` Jan Kara
2014-06-01 14:08 ` Sasha Levin
2014-06-03 1:55 ` Steven Rostedt
2014-06-03 8:47 ` Peter Zijlstra
2014-06-03 9:00 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).