* [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads @ 2014-02-28 6:23 Mike Galbraith 2014-02-28 9:00 ` Pavel Vasilyev ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Mike Galbraith @ 2014-02-28 6:23 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Mel Gorman, LKML, RT Bad idea: [ 908.026136] [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0 [ 908.026145] [<ffffffff8108f701>] task_numa_free+0x31/0x130 [ 908.026151] [<ffffffff8108121e>] finish_task_switch+0xce/0x100 [ 908.026156] [<ffffffff81509c0a>] thread_return+0x48/0x4ae [ 908.026160] [<ffffffff8150a095>] schedule+0x25/0xa0 [ 908.026163] [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0 [ 908.026170] [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680 [ 908.026175] [<ffffffff8100242d>] do_signal+0x3d/0x5b0 [ 908.026179] [<ffffffff81002a30>] do_notify_resume+0x90/0xe0 [ 908.026186] [<ffffffff81513176>] int_signal+0x12/0x17 [ 908.026193] [<00007ff2a388b1d0>] 0x7ff2a388b1cf Signed-off-by: Mike Galbraith <bitbucket@online.de> diff --git a/kernel/fork.c b/kernel/fork.c index a17621c6cd42..332688e5e7b4 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -237,6 +237,7 @@ void __put_task_struct(struct task_struct *tsk) WARN_ON(atomic_read(&tsk->usage)); WARN_ON(tsk == current); + task_numa_free(tsk); security_task_free(tsk); exit_creds(tsk); delayacct_tsk_free(tsk); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 6edbef296ece..94a963fd040e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2149,8 +2149,6 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev) if (mm) mmdrop(mm); if (unlikely(prev_state == TASK_DEAD)) { - task_numa_free(prev); - if (prev->sched_class->task_dead) prev->sched_class->task_dead(prev); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads 2014-02-28 6:23 [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads Mike Galbraith @ 2014-02-28 9:00 ` Pavel Vasilyev 2014-02-28 11:32 ` Peter Zijlstra 2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith 2 siblings, 0 replies; 12+ messages in thread From: Pavel Vasilyev @ 2014-02-28 9:00 UTC (permalink / raw) Cc: RT 28.02.2014 10:23, Mike Galbraith пишет: Maybe --- a/kernel/fork.c +++ b/kernel/fork.c @@ -237,6 +237,7 @@ void __put_task_struct(struct task_struct *tsk) WARN_ON(atomic_read(&tsk->usage)); WARN_ON(tsk == current); +#ifdef CONFIG_NUMA + task_numa_free(tsk); +#endif security_task_free(tsk); exit_creds(tsk); delayacct_tsk_free(tsk); ... ... ... > -- Pavel. -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads 2014-02-28 6:23 [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads Mike Galbraith 2014-02-28 9:00 ` Pavel Vasilyev @ 2014-02-28 11:32 ` Peter Zijlstra 2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith 2 siblings, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2014-02-28 11:32 UTC (permalink / raw) To: Mike Galbraith; +Cc: Mel Gorman, LKML, RT On Fri, Feb 28, 2014 at 07:23:11AM +0100, Mike Galbraith wrote: > > Bad idea: > [ 908.026136] [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0 > [ 908.026145] [<ffffffff8108f701>] task_numa_free+0x31/0x130 > [ 908.026151] [<ffffffff8108121e>] finish_task_switch+0xce/0x100 > [ 908.026156] [<ffffffff81509c0a>] thread_return+0x48/0x4ae > [ 908.026160] [<ffffffff8150a095>] schedule+0x25/0xa0 > [ 908.026163] [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0 > [ 908.026170] [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680 > [ 908.026175] [<ffffffff8100242d>] do_signal+0x3d/0x5b0 > [ 908.026179] [<ffffffff81002a30>] do_notify_resume+0x90/0xe0 > [ 908.026186] [<ffffffff81513176>] int_signal+0x12/0x17 > [ 908.026193] [<00007ff2a388b1d0>] 0x7ff2a388b1cf > > Signed-off-by: Mike Galbraith <bitbucket@online.de> Fair enough; thanks! ^ permalink raw reply [flat|nested] 12+ messages in thread
* [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-02-28 6:23 [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads Mike Galbraith 2014-02-28 9:00 ` Pavel Vasilyev 2014-02-28 11:32 ` Peter Zijlstra @ 2014-03-11 12:40 ` tip-bot for Mike Galbraith 2014-04-06 19:17 ` Sasha Levin 2 siblings, 1 reply; 12+ messages in thread From: tip-bot for Mike Galbraith @ 2014-03-11 12:40 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, torvalds, peterz, bitbucket, mgorman, akpm, tglx Commit-ID: 156654f491dd8d52687a5fbe1637f472a52ce75b Gitweb: http://git.kernel.org/tip/156654f491dd8d52687a5fbe1637f472a52ce75b Author: Mike Galbraith <bitbucket@online.de> AuthorDate: Fri, 28 Feb 2014 07:23:11 +0100 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Tue, 11 Mar 2014 12:05:43 +0100 sched/numa: Move task_numa_free() to __put_task_struct() Bad idea on -rt: [ 908.026136] [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0 [ 908.026145] [<ffffffff8108f701>] task_numa_free+0x31/0x130 [ 908.026151] [<ffffffff8108121e>] finish_task_switch+0xce/0x100 [ 908.026156] [<ffffffff81509c0a>] thread_return+0x48/0x4ae [ 908.026160] [<ffffffff8150a095>] schedule+0x25/0xa0 [ 908.026163] [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0 [ 908.026170] [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680 [ 908.026175] [<ffffffff8100242d>] do_signal+0x3d/0x5b0 [ 908.026179] [<ffffffff81002a30>] do_notify_resume+0x90/0xe0 [ 908.026186] [<ffffffff81513176>] int_signal+0x12/0x17 [ 908.026193] [<00007ff2a388b1d0>] 0x7ff2a388b1cf and since upstream does not mind where we do this, be a bit nicer ... Signed-off-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1393568591.6018.27.camel@marge.simpson.net Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/fork.c | 1 + kernel/sched/core.c | 2 -- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index a17621c..332688e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -237,6 +237,7 @@ void __put_task_struct(struct task_struct *tsk) WARN_ON(atomic_read(&tsk->usage)); WARN_ON(tsk == current); + task_numa_free(tsk); security_task_free(tsk); exit_creds(tsk); delayacct_tsk_free(tsk); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index dd89c27..9e126a2 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2151,8 +2151,6 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev) if (mm) mmdrop(mm); if (unlikely(prev_state == TASK_DEAD)) { - task_numa_free(prev); - if (prev->sched_class->task_dead) prev->sched_class->task_dead(prev); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith @ 2014-04-06 19:17 ` Sasha Levin 2014-04-07 5:29 ` Mike Galbraith 0 siblings, 1 reply; 12+ messages in thread From: Sasha Levin @ 2014-04-06 19:17 UTC (permalink / raw) To: mingo, hpa, linux-kernel, torvalds, peterz, bitbucket, mgorman, akpm, tglx, linux-tip-commits Cc: Dave Jones On 03/11/2014 08:40 AM, tip-bot for Mike Galbraith wrote: > Commit-ID: 156654f491dd8d52687a5fbe1637f472a52ce75b > Gitweb: http://git.kernel.org/tip/156654f491dd8d52687a5fbe1637f472a52ce75b > Author: Mike Galbraith <bitbucket@online.de> > AuthorDate: Fri, 28 Feb 2014 07:23:11 +0100 > Committer: Ingo Molnar <mingo@kernel.org> > CommitDate: Tue, 11 Mar 2014 12:05:43 +0100 > > sched/numa: Move task_numa_free() to __put_task_struct() > > Bad idea on -rt: > > [ 908.026136] [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0 > [ 908.026145] [<ffffffff8108f701>] task_numa_free+0x31/0x130 > [ 908.026151] [<ffffffff8108121e>] finish_task_switch+0xce/0x100 > [ 908.026156] [<ffffffff81509c0a>] thread_return+0x48/0x4ae > [ 908.026160] [<ffffffff8150a095>] schedule+0x25/0xa0 > [ 908.026163] [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0 > [ 908.026170] [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680 > [ 908.026175] [<ffffffff8100242d>] do_signal+0x3d/0x5b0 > [ 908.026179] [<ffffffff81002a30>] do_notify_resume+0x90/0xe0 > [ 908.026186] [<ffffffff81513176>] int_signal+0x12/0x17 > [ 908.026193] [<00007ff2a388b1d0>] 0x7ff2a388b1cf > > and since upstream does not mind where we do this, be a bit nicer ... > > Signed-off-by: Mike Galbraith <bitbucket@online.de> > Signed-off-by: Peter Zijlstra <peterz@infradead.org> > Cc: Mel Gorman <mgorman@suse.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Link: http://lkml.kernel.org/r/1393568591.6018.27.camel@marge.simpson.net > Signed-off-by: Ingo Molnar <mingo@kernel.org> As it seems, upstream does mind: [ 2590.260734] ====================================================== [ 2590.261695] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] [ 2590.262748] 3.14.0-next-20140403-sasha-00022-g10224c0 #377 Tainted: G W [ 2590.263846] ------------------------------------------------------ [ 2590.264730] trinity-c244/1210 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 2590.265783] (&(&grp->lock)->rlock){+.+...}, at: task_numa_free (kernel/sched/fair.c:1714) [ 2590.267179] [ 2590.267179] and this task is already holding: [ 2590.267996] (&(&new_timer->it_lock)->rlock){-.....}, at: exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998) [ 2590.269381] which would create a new lock dependency: [ 2590.270067] (&(&new_timer->it_lock)->rlock){-.....} -> (&(&grp->lock)->rlock){+.+...} [ 2590.270067] [ 2590.270067] but this new dependency connects a HARDIRQ-irq-safe lock: [ 2590.270067] (&(&new_timer->it_lock)->rlock){-.....} ... which became HARDIRQ-irq-safe at: [ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2783 kernel/locking/lockdep.c:3138) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159) [ 2590.270067] posix_timer_fn (kernel/posix-timers.c:437) [ 2590.270067] __run_hrtimer (kernel/hrtimer.c:1245 (discriminator 2)) [ 2590.270067] hrtimer_interrupt (kernel/hrtimer.c:1892) [ 2590.270067] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:921) [ 2590.270067] smp_apic_timer_interrupt (arch/x86/include/asm/apic.h:696 arch/x86/kernel/apic/apic.c:945) [ 2590.270067] apic_timer_interrupt (arch/x86/kernel/entry_64.S:1164) [ 2590.270067] default_idle (arch/x86/include/asm/paravirt.h:111 arch/x86/kernel/process.c:310) [ 2590.270067] arch_cpu_idle (arch/x86/kernel/process.c:302) [ 2590.270067] cpu_idle_loop (kernel/sched/idle.c:179 kernel/sched/idle.c:226) [ 2590.270067] cpu_startup_entry (??:?) [ 2590.270067] start_secondary (arch/x86/kernel/smpboot.c:267) [ 2590.270067] [ 2590.270067] to a HARDIRQ-irq-unsafe lock: [ 2590.270067] (&(&grp->lock)->rlock){+.+...} ... which became HARDIRQ-irq-unsafe at: [ 2590.270067] ... __lock_acquire (kernel/locking/lockdep.c:2800 kernel/locking/lockdep.c:3138) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504) [ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794) [ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909) [ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935) [ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220) [ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273) [ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263) [ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496) [ 2590.270067] [ 2590.270067] other info that might help us debug this: [ 2590.270067] [ 2590.270067] Possible interrupt unsafe locking scenario: [ 2590.270067] [ 2590.270067] CPU0 CPU1 [ 2590.270067] ---- ---- [ 2590.270067] lock(&(&grp->lock)->rlock); [ 2590.270067] local_irq_disable(); [ 2590.270067] lock(&(&new_timer->it_lock)->rlock); [ 2590.270067] lock(&(&grp->lock)->rlock); [ 2590.270067] <Interrupt> [ 2590.270067] lock(&(&new_timer->it_lock)->rlock); [ 2590.270067] [ 2590.270067] *** DEADLOCK *** [ 2590.270067] [ 2590.270067] 1 lock held by trinity-c244/1210: [ 2590.270067] #0: (&(&new_timer->it_lock)->rlock){-.....}, at: exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998) [ 2590.270067] the dependencies between HARDIRQ-irq-safe lock and the holding lock: [ 2590.270067] -> (&(&new_timer->it_lock)->rlock){-.....} ops: 361 { [ 2590.270067] IN-HARDIRQ-W at: [ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2783 kernel/locking/lockdep.c:3138) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159) [ 2590.270067] posix_timer_fn (kernel/posix-timers.c:437) [ 2590.270067] __run_hrtimer (kernel/hrtimer.c:1245 (discriminator 2)) [ 2590.270067] hrtimer_interrupt (kernel/hrtimer.c:1892) [ 2590.270067] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:921) [ 2590.270067] smp_apic_timer_interrupt (arch/x86/include/asm/apic.h:696 arch/x86/kernel/apic/apic.c:945) [ 2590.270067] apic_timer_interrupt (arch/x86/kernel/entry_64.S:1164) [ 2590.270067] default_idle (arch/x86/include/asm/paravirt.h:111 arch/x86/kernel/process.c:310) [ 2590.270067] arch_cpu_idle (arch/x86/kernel/process.c:302) [ 2590.270067] cpu_idle_loop (kernel/sched/idle.c:179 kernel/sched/idle.c:226) [ 2590.270067] cpu_startup_entry (??:?) [ 2590.270067] start_secondary (arch/x86/kernel/smpboot.c:267) [ 2590.270067] INITIAL USE at: [ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:3142) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159) [ 2590.270067] exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998) [ 2590.270067] do_exit (kernel/exit.c:766) [ 2590.270067] do_group_exit (kernel/exit.c:919) [ 2590.270067] SyS_exit_group (kernel/exit.c:930) [ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749) [ 2590.270067] } [ 2590.270067] ... key at: __key.33130 (??:?) [ 2590.270067] ... acquired at: [ 2590.270067] check_irq_usage (kernel/locking/lockdep.c:1638) [ 2590.270067] __lock_acquire (kernel/locking/lockdep_states.h:7 kernel/locking/lockdep.c:1844 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 2590.270067] task_numa_free (kernel/sched/fair.c:1714) [ 2590.270067] __put_task_struct (kernel/fork.c:244 (discriminator 2)) [ 2590.270067] posix_cpu_timer_del (include/linux/sched.h:1807 kernel/posix-cpu-timers.c:409) [ 2590.270067] exit_itimers (kernel/posix-timers.c:973 kernel/posix-timers.c:998) [ 2590.270067] do_exit (kernel/exit.c:766) [ 2590.270067] do_group_exit (kernel/exit.c:919) [ 2590.270067] SyS_exit_group (kernel/exit.c:930) [ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749) [ 2590.270067] [ 2590.270067] the dependencies between the lock to be acquired and HARDIRQ-irq-unsafe lock: [ 2590.270067] -> (&(&grp->lock)->rlock){+.+...} ops: 91 { [ 2590.270067] HARDIRQ-ON-W at: [ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2800 kernel/locking/lockdep.c:3138) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504) [ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794) [ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909) [ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935) [ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220) [ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273) [ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263) [ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496) [ 2590.270067] SOFTIRQ-ON-W at: [ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2804 kernel/locking/lockdep.c:3138) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504) [ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794) [ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909) [ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935) [ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220) [ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273) [ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263) [ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496) [ 2590.270067] INITIAL USE at: [ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:3142) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504) [ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794) [ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909) [ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935) [ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220) [ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273) [ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263) [ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496) [ 2590.270067] } [ 2590.270067] ... key at: __key.32449 (??:?) [ 2590.270067] ... acquired at: [ 2590.270067] check_irq_usage (kernel/locking/lockdep.c:1638) [ 2590.270067] __lock_acquire (kernel/locking/lockdep_states.h:7 kernel/locking/lockdep.c:1844 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 2590.270067] task_numa_free (kernel/sched/fair.c:1714) [ 2590.270067] __put_task_struct (kernel/fork.c:244 (discriminator 2)) [ 2590.270067] posix_cpu_timer_del (include/linux/sched.h:1807 kernel/posix-cpu-timers.c:409) [ 2590.270067] exit_itimers (kernel/posix-timers.c:973 kernel/posix-timers.c:998) [ 2590.270067] do_exit (kernel/exit.c:766) [ 2590.270067] do_group_exit (kernel/exit.c:919) [ 2590.270067] SyS_exit_group (kernel/exit.c:930) [ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749) [ 2590.270067] [ 2590.270067] [ 2590.270067] stack backtrace: [ 2590.270067] CPU: 3 PID: 1210 Comm: trinity-c244 Tainted: G W 3.14.0-next-20140403-sasha-00022-g10224c0 #377 [ 2590.270067] ffffffff87a83b60 ffff880081695ad8 ffffffff844bfb3f 0000000000000000 [ 2590.270067] ffff880081698cf0 ffff880081695be8 ffffffff811c0d05 0000000000000000 [ 2590.270067] ffffffff00000000 ffff880000000001 ffffffff8107aac5 ffff880081695b38 [ 2590.270067] Call Trace: [ 2590.270067] dump_stack (lib/dump_stack.c:52) [ 2590.270067] check_usage (kernel/locking/lockdep.c:1549 kernel/locking/lockdep.c:1580) [ 2590.270067] ? sched_clock (arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:305) [ 2590.270067] ? sched_clock_cpu (kernel/sched/clock.c:311) [ 2590.270067] check_irq_usage (kernel/locking/lockdep.c:1638) [ 2590.270067] __lock_acquire (kernel/locking/lockdep_states.h:7 kernel/locking/lockdep.c:1844 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) [ 2590.270067] ? __lock_acquire (kernel/locking/lockdep.c:3189) [ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 2590.270067] ? task_numa_free (kernel/sched/fair.c:1714) [ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [ 2590.270067] ? task_numa_free (kernel/sched/fair.c:1714) [ 2590.270067] ? trace_hardirqs_off_caller (kernel/locking/lockdep.c:2638 (discriminator 2)) [ 2590.270067] task_numa_free (kernel/sched/fair.c:1714) [ 2590.270067] __put_task_struct (kernel/fork.c:244 (discriminator 2)) [ 2590.270067] posix_cpu_timer_del (include/linux/sched.h:1807 kernel/posix-cpu-timers.c:409) [ 2590.270067] exit_itimers (kernel/posix-timers.c:973 kernel/posix-timers.c:998) [ 2590.270067] do_exit (kernel/exit.c:766) [ 2590.270067] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63) [ 2590.270067] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 kernel/locking/lockdep.c:2599) [ 2590.270067] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607) [ 2590.270067] do_group_exit (kernel/exit.c:919) [ 2590.270067] SyS_exit_group (kernel/exit.c:930) [ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749) Thanks, Sasha ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-04-06 19:17 ` Sasha Levin @ 2014-04-07 5:29 ` Mike Galbraith 2014-04-07 7:30 ` Mike Galbraith 0 siblings, 1 reply; 12+ messages in thread From: Mike Galbraith @ 2014-04-07 5:29 UTC (permalink / raw) To: Sasha Levin Cc: mingo, hpa, linux-kernel, torvalds, peterz, mgorman, akpm, tglx, linux-tip-commits, Dave Jones On Sun, 2014-04-06 at 15:17 -0400, Sasha Levin wrote: > On 03/11/2014 08:40 AM, tip-bot for Mike Galbraith wrote: > > Commit-ID: 156654f491dd8d52687a5fbe1637f472a52ce75b > > Gitweb: http://git.kernel.org/tip/156654f491dd8d52687a5fbe1637f472a52ce75b > > Author: Mike Galbraith <bitbucket@online.de> > > AuthorDate: Fri, 28 Feb 2014 07:23:11 +0100 > > Committer: Ingo Molnar <mingo@kernel.org> > > CommitDate: Tue, 11 Mar 2014 12:05:43 +0100 > > > > sched/numa: Move task_numa_free() to __put_task_struct() > > > > Bad idea on -rt: > > > > [ 908.026136] [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0 > > [ 908.026145] [<ffffffff8108f701>] task_numa_free+0x31/0x130 > > [ 908.026151] [<ffffffff8108121e>] finish_task_switch+0xce/0x100 > > [ 908.026156] [<ffffffff81509c0a>] thread_return+0x48/0x4ae > > [ 908.026160] [<ffffffff8150a095>] schedule+0x25/0xa0 > > [ 908.026163] [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0 > > [ 908.026170] [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680 > > [ 908.026175] [<ffffffff8100242d>] do_signal+0x3d/0x5b0 > > [ 908.026179] [<ffffffff81002a30>] do_notify_resume+0x90/0xe0 > > [ 908.026186] [<ffffffff81513176>] int_signal+0x12/0x17 > > [ 908.026193] [<00007ff2a388b1d0>] 0x7ff2a388b1cf > > > > and since upstream does not mind where we do this, be a bit nicer ... > > > > Signed-off-by: Mike Galbraith <bitbucket@online.de> > > Signed-off-by: Peter Zijlstra <peterz@infradead.org> > > Cc: Mel Gorman <mgorman@suse.com> > > Cc: Linus Torvalds <torvalds@linux-foundation.org> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Link: http://lkml.kernel.org/r/1393568591.6018.27.camel@marge.simpson.net > > Signed-off-by: Ingo Molnar <mingo@kernel.org> > > As it seems, upstream does mind: > > [ 2590.260734] ====================================================== > [ 2590.261695] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] > [ 2590.262748] 3.14.0-next-20140403-sasha-00022-g10224c0 #377 Tainted: G W > [ 2590.263846] ------------------------------------------------------ > [ 2590.264730] trinity-c244/1210 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: > [ 2590.265783] (&(&grp->lock)->rlock){+.+...}, at: task_numa_free (kernel/sched/fair.c:1714) > [ 2590.267179] > [ 2590.267179] and this task is already holding: > [ 2590.267996] (&(&new_timer->it_lock)->rlock){-.....}, at: exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998) > [ 2590.269381] which would create a new lock dependency: > [ 2590.270067] (&(&new_timer->it_lock)->rlock){-.....} -> (&(&grp->lock)->rlock){+.+...} I'm not getting it. I moved task_numa_free() from one interrupts enabled spot to another. But, with numa=fake=4 and lockdep enabled, not only does lockdep gripe, my little box locks up on splat. Saying spin_lock/unlock_irq() did the expected, just moved lockdep gripe to task_numa_fault(). > [ 2590.270067] Possible interrupt unsafe locking scenario: > [ 2590.270067] > [ 2590.270067] CPU0 CPU1 > [ 2590.270067] ---- ---- > [ 2590.270067] lock(&(&grp->lock)->rlock); > [ 2590.270067] local_irq_disable(); > [ 2590.270067] lock(&(&new_timer->it_lock)->rlock); > [ 2590.270067] lock(&(&grp->lock)->rlock); > [ 2590.270067] <Interrupt> > [ 2590.270067] lock(&(&new_timer->it_lock)->rlock); > [ 2590.270067] > [ 2590.270067] *** DEADLOCK *** Ok, so how did I manage that HARDIRQ-safe -> HARDIRQ-unsafe? -Mike ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-04-07 5:29 ` Mike Galbraith @ 2014-04-07 7:30 ` Mike Galbraith 2014-04-07 8:16 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Mike Galbraith @ 2014-04-07 7:30 UTC (permalink / raw) To: Sasha Levin Cc: mingo, hpa, linux-kernel, torvalds, peterz, mgorman, akpm, tglx, linux-tip-commits, Dave Jones On Mon, 2014-04-07 at 07:29 +0200, Mike Galbraith wrote: > I'm not getting it. > > I moved task_numa_free() from one interrupts enabled spot to another. > But, with numa=fake=4 and lockdep enabled, not only does lockdep gripe, > my little box locks up on splat. Saying spin_lock/unlock_irq() did the > expected, just moved lockdep gripe to task_numa_fault(). > > > > [ 2590.270067] Possible interrupt unsafe locking scenario: > > [ 2590.270067] > > [ 2590.270067] CPU0 CPU1 > > [ 2590.270067] ---- ---- > > [ 2590.270067] lock(&(&grp->lock)->rlock); > > [ 2590.270067] local_irq_disable(); > > [ 2590.270067] lock(&(&new_timer->it_lock)->rlock); > > [ 2590.270067] lock(&(&grp->lock)->rlock); > > [ 2590.270067] <Interrupt> > > [ 2590.270067] lock(&(&new_timer->it_lock)->rlock); > > [ 2590.270067] > > [ 2590.270067] *** DEADLOCK *** > > Ok, so how did I manage that HARDIRQ-safe -> HARDIRQ-unsafe? Think I'll turn lockdep off, and make context switches take a good long while after finish_lock_switch(), but meanwhile, this made it happy. Sasha reports that lockdep claims 156654f491dd8d52687a5fbe1637f472a52ce75b made numa_group.lock interrupt unsafe. While I don't see how that could be given the commit in question moved task_numa_free() from one irq enabled region to another, the below does make both gripes and lockup upon gripe with numa=fake=4 go away. Reported-by: Sasha Levin <sasha.levin@oracle.com> Not-signed-off-by: Mike Galbraith <bitbucket@online.de> --- kernel/sched/fair.c | 12 +++++++----- kernel/sched/sched.h | 9 +++++++++ 2 files changed, 16 insertions(+), 5 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1497,7 +1497,7 @@ static void task_numa_placement(struct t /* If the task is part of a group prevent parallel updates to group stats */ if (p->numa_group) { group_lock = &p->numa_group->lock; - spin_lock(group_lock); + spin_lock_irq(group_lock); } /* Find the node with the highest number of faults */ @@ -1572,7 +1572,7 @@ static void task_numa_placement(struct t } } - spin_unlock(group_lock); + spin_unlock_irq(group_lock); } /* Preferred node as the node with the most faults */ @@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_ if (!join) return; - double_lock(&my_grp->lock, &grp->lock); + BUG_ON(irqs_disabled()); + double_lock_irq(&my_grp->lock, &grp->lock); for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { my_grp->faults[i] -= p->numa_faults_memory[i]; @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_ spin_unlock(&my_grp->lock); spin_unlock(&grp->lock); + local_irq_enable(); rcu_assign_pointer(p->numa_group, grp); @@ -1710,14 +1712,14 @@ void task_numa_free(struct task_struct * void *numa_faults = p->numa_faults_memory; if (grp) { - spin_lock(&grp->lock); + spin_lock_irq(&grp->lock); for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) grp->faults[i] -= p->numa_faults_memory[i]; grp->total_faults -= p->total_numa_faults; list_del(&p->numa_entry); grp->nr_tasks--; - spin_unlock(&grp->lock); + spin_unlock_irq(&grp->lock); rcu_assign_pointer(p->numa_group, NULL); put_numa_group(grp); } --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1388,6 +1388,15 @@ static inline void double_lock(spinlock_ spin_lock_nested(l2, SINGLE_DEPTH_NESTING); } +static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2) +{ + if (l1 > l2) + swap(l1, l2); + + spin_lock_irq(l1); + spin_lock_nested(l2, SINGLE_DEPTH_NESTING); +} + static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2) { if (l1 > l2) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-04-07 7:30 ` Mike Galbraith @ 2014-04-07 8:16 ` Peter Zijlstra 2014-04-07 8:40 ` Mike Galbraith 2014-04-07 8:55 ` Mike Galbraith 0 siblings, 2 replies; 12+ messages in thread From: Peter Zijlstra @ 2014-04-07 8:16 UTC (permalink / raw) To: Mike Galbraith Cc: Sasha Levin, mingo, hpa, linux-kernel, torvalds, mgorman, akpm, tglx, linux-tip-commits, Dave Jones On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote: > - double_lock(&my_grp->lock, &grp->lock); > + BUG_ON(irqs_disabled()); > + double_lock_irq(&my_grp->lock, &grp->lock); So either make this: local_irq_disable(); double_lock(); or > > for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { > my_grp->faults[i] -= p->numa_faults_memory[i]; > @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_ > > spin_unlock(&my_grp->lock); > spin_unlock(&grp->lock); > + local_irq_enable(); use: spin_unlock() spin_unlock_irq() or so, but this imbalance is making my itch :-) > > rcu_assign_pointer(p->numa_group, grp); > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-04-07 8:16 ` Peter Zijlstra @ 2014-04-07 8:40 ` Mike Galbraith 2014-04-07 8:55 ` Mike Galbraith 1 sibling, 0 replies; 12+ messages in thread From: Mike Galbraith @ 2014-04-07 8:40 UTC (permalink / raw) To: Peter Zijlstra Cc: Sasha Levin, mingo, hpa, linux-kernel, torvalds, mgorman, akpm, tglx, linux-tip-commits, Dave Jones On Mon, 2014-04-07 at 10:16 +0200, Peter Zijlstra wrote: > On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote: > > - double_lock(&my_grp->lock, &grp->lock); > > + BUG_ON(irqs_disabled()); > > + double_lock_irq(&my_grp->lock, &grp->lock); > > So either make this: > > local_irq_disable(); > double_lock(); > > or > > > > > for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { > > my_grp->faults[i] -= p->numa_faults_memory[i]; > > @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_ > > > > spin_unlock(&my_grp->lock); > > spin_unlock(&grp->lock); > > + local_irq_enable(); > > use: > spin_unlock() > spin_unlock_irq() *thwap* Well duh. > or so, but this imbalance is making my itch :-) Yeah, much better. Before I actually sign that off, mind cluing me in as to why I should not be sitting here thinking lockdep smoked its breakfast? -Mike ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-04-07 8:16 ` Peter Zijlstra 2014-04-07 8:40 ` Mike Galbraith @ 2014-04-07 8:55 ` Mike Galbraith 2014-04-13 20:53 ` Govindarajulu Varadarajan 2014-04-14 7:22 ` [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat tip-bot for Mike Galbraith 1 sibling, 2 replies; 12+ messages in thread From: Mike Galbraith @ 2014-04-07 8:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Sasha Levin, mingo, hpa, linux-kernel, torvalds, mgorman, akpm, tglx, linux-tip-commits, Dave Jones On Mon, 2014-04-07 at 10:16 +0200, Peter Zijlstra wrote: > On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote: > > - double_lock(&my_grp->lock, &grp->lock); > > + BUG_ON(irqs_disabled()); > > + double_lock_irq(&my_grp->lock, &grp->lock); > > So either make this: > > local_irq_disable(); > double_lock(); > > or > > > > > for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { > > my_grp->faults[i] -= p->numa_faults_memory[i]; > > @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_ > > > > spin_unlock(&my_grp->lock); > > spin_unlock(&grp->lock); > > + local_irq_enable(); > > use: > spin_unlock() > spin_unlock_irq() > > or so, but this imbalance is making my itch :-) sched, numa: fix task_numa_free() lockdep splat Sasha reports that lockdep claims 156654f491dd8d52687a5fbe1637f472a52ce75b made numa_group.lock interrupt unsafe. While I don't see how that could be given the commit in question moved task_numa_free() from one irq enabled region to another, the below does make both gripes and lockups upon gripe with numa=fake=4 go away. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Mike Galbraith <bitbucket@online.de> --- kernel/sched/fair.c | 13 +++++++------ kernel/sched/sched.h | 9 +++++++++ 2 files changed, 16 insertions(+), 6 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1497,7 +1497,7 @@ static void task_numa_placement(struct t /* If the task is part of a group prevent parallel updates to group stats */ if (p->numa_group) { group_lock = &p->numa_group->lock; - spin_lock(group_lock); + spin_lock_irq(group_lock); } /* Find the node with the highest number of faults */ @@ -1572,7 +1572,7 @@ static void task_numa_placement(struct t } } - spin_unlock(group_lock); + spin_unlock_irq(group_lock); } /* Preferred node as the node with the most faults */ @@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_ if (!join) return; - double_lock(&my_grp->lock, &grp->lock); + BUG_ON(irqs_disabled()); + double_lock_irq(&my_grp->lock, &grp->lock); for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { my_grp->faults[i] -= p->numa_faults_memory[i]; @@ -1691,7 +1692,7 @@ static void task_numa_group(struct task_ grp->nr_tasks++; spin_unlock(&my_grp->lock); - spin_unlock(&grp->lock); + spin_unlock_irq(&grp->lock); rcu_assign_pointer(p->numa_group, grp); @@ -1710,14 +1711,14 @@ void task_numa_free(struct task_struct * void *numa_faults = p->numa_faults_memory; if (grp) { - spin_lock(&grp->lock); + spin_lock_irq(&grp->lock); for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) grp->faults[i] -= p->numa_faults_memory[i]; grp->total_faults -= p->total_numa_faults; list_del(&p->numa_entry); grp->nr_tasks--; - spin_unlock(&grp->lock); + spin_unlock_irq(&grp->lock); rcu_assign_pointer(p->numa_group, NULL); put_numa_group(grp); } --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1388,6 +1388,15 @@ static inline void double_lock(spinlock_ spin_lock_nested(l2, SINGLE_DEPTH_NESTING); } +static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2) +{ + if (l1 > l2) + swap(l1, l2); + + spin_lock_irq(l1); + spin_lock_nested(l2, SINGLE_DEPTH_NESTING); +} + static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2) { if (l1 > l2) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() 2014-04-07 8:55 ` Mike Galbraith @ 2014-04-13 20:53 ` Govindarajulu Varadarajan 2014-04-14 7:22 ` [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat tip-bot for Mike Galbraith 1 sibling, 0 replies; 12+ messages in thread From: Govindarajulu Varadarajan @ 2014-04-13 20:53 UTC (permalink / raw) To: Mike Galbraith Cc: Peter Zijlstra, Sasha Levin, mingo, hpa, linux-kernel, torvalds, mgorman, akpm, tglx, linux-tip-commits, Dave Jones On Mon, 7 Apr 2014, Mike Galbraith wrote: > On Mon, 2014-04-07 at 10:16 +0200, Peter Zijlstra wrote: >> On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote: >>> - double_lock(&my_grp->lock, &grp->lock); >>> + BUG_ON(irqs_disabled()); >>> + double_lock_irq(&my_grp->lock, &grp->lock); >> >> So either make this: >> >> local_irq_disable(); >> double_lock(); >> >> or >> >>> >>> for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { >>> my_grp->faults[i] -= p->numa_faults_memory[i]; >>> @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_ >>> >>> spin_unlock(&my_grp->lock); >>> spin_unlock(&grp->lock); >>> + local_irq_enable(); >> >> use: >> spin_unlock() >> spin_unlock_irq() >> >> or so, but this imbalance is making my itch :-) > > sched, numa: fix task_numa_free() lockdep splat > > Sasha reports that lockdep claims 156654f491dd8d52687a5fbe1637f472a52ce75b made > numa_group.lock interrupt unsafe. While I don't see how that could be given the > commit in question moved task_numa_free() from one irq enabled region to another, > the below does make both gripes and lockups upon gripe with numa=fake=4 go away. > Hi I Am hitting this bug quite frequently. I do not see the problem after applying this patch. Thanks Tested-by: Govindarajulu Varadarajan <govindx7c6@gmail.com> > Reported-by: Sasha Levin <sasha.levin@oracle.com> > Signed-off-by: Mike Galbraith <bitbucket@online.de> > --- > kernel/sched/fair.c | 13 +++++++------ > kernel/sched/sched.h | 9 +++++++++ > 2 files changed, 16 insertions(+), 6 deletions(-) > > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -1497,7 +1497,7 @@ static void task_numa_placement(struct t > /* If the task is part of a group prevent parallel updates to group stats */ > if (p->numa_group) { > group_lock = &p->numa_group->lock; > - spin_lock(group_lock); > + spin_lock_irq(group_lock); > } > > /* Find the node with the highest number of faults */ > @@ -1572,7 +1572,7 @@ static void task_numa_placement(struct t > } > } > > - spin_unlock(group_lock); > + spin_unlock_irq(group_lock); > } > > /* Preferred node as the node with the most faults */ > @@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_ > if (!join) > return; > > - double_lock(&my_grp->lock, &grp->lock); > + BUG_ON(irqs_disabled()); > + double_lock_irq(&my_grp->lock, &grp->lock); > > for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { > my_grp->faults[i] -= p->numa_faults_memory[i]; > @@ -1691,7 +1692,7 @@ static void task_numa_group(struct task_ > grp->nr_tasks++; > > spin_unlock(&my_grp->lock); > - spin_unlock(&grp->lock); > + spin_unlock_irq(&grp->lock); > > rcu_assign_pointer(p->numa_group, grp); > > @@ -1710,14 +1711,14 @@ void task_numa_free(struct task_struct * > void *numa_faults = p->numa_faults_memory; > > if (grp) { > - spin_lock(&grp->lock); > + spin_lock_irq(&grp->lock); > for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) > grp->faults[i] -= p->numa_faults_memory[i]; > grp->total_faults -= p->total_numa_faults; > > list_del(&p->numa_entry); > grp->nr_tasks--; > - spin_unlock(&grp->lock); > + spin_unlock_irq(&grp->lock); > rcu_assign_pointer(p->numa_group, NULL); > put_numa_group(grp); > } > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -1388,6 +1388,15 @@ static inline void double_lock(spinlock_ > spin_lock_nested(l2, SINGLE_DEPTH_NESTING); > } > > +static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2) > +{ > + if (l1 > l2) > + swap(l1, l2); > + > + spin_lock_irq(l1); > + spin_lock_nested(l2, SINGLE_DEPTH_NESTING); > +} > + > static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2) > { > if (l1 > l2) > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 12+ messages in thread
* [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat 2014-04-07 8:55 ` Mike Galbraith 2014-04-13 20:53 ` Govindarajulu Varadarajan @ 2014-04-14 7:22 ` tip-bot for Mike Galbraith 1 sibling, 0 replies; 12+ messages in thread From: tip-bot for Mike Galbraith @ 2014-04-14 7:22 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, sasha.levin, hpa, mingo, peterz, bitbucket, davej, tglx Commit-ID: 60e69eed85bb7b5198ef70643b5895c26ad76ef7 Gitweb: http://git.kernel.org/tip/60e69eed85bb7b5198ef70643b5895c26ad76ef7 Author: Mike Galbraith <bitbucket@online.de> AuthorDate: Mon, 7 Apr 2014 10:55:15 +0200 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Fri, 11 Apr 2014 10:39:15 +0200 sched/numa: Fix task_numa_free() lockdep splat Sasha reported that lockdep claims that the following commit: made numa_group.lock interrupt unsafe: 156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()") While I don't see how that could be, given the commit in question moved task_numa_free() from one irq enabled region to another, the below does make both gripes and lockups upon gripe with numa=fake=4 go away. Reported-by: Sasha Levin <sasha.levin@oracle.com> Fixes: 156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()") Signed-off-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: torvalds@linux-foundation.org Cc: mgorman@suse.com Cc: akpm@linux-foundation.org Cc: Dave Jones <davej@redhat.com> Link: http://lkml.kernel.org/r/1396860915.5170.5.camel@marge.simpson.net Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/sched/fair.c | 13 +++++++------ kernel/sched/sched.h | 9 +++++++++ 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7e9bd0b..4f14a65 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1497,7 +1497,7 @@ static void task_numa_placement(struct task_struct *p) /* If the task is part of a group prevent parallel updates to group stats */ if (p->numa_group) { group_lock = &p->numa_group->lock; - spin_lock(group_lock); + spin_lock_irq(group_lock); } /* Find the node with the highest number of faults */ @@ -1572,7 +1572,7 @@ static void task_numa_placement(struct task_struct *p) } } - spin_unlock(group_lock); + spin_unlock_irq(group_lock); } /* Preferred node as the node with the most faults */ @@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags, if (!join) return; - double_lock(&my_grp->lock, &grp->lock); + BUG_ON(irqs_disabled()); + double_lock_irq(&my_grp->lock, &grp->lock); for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { my_grp->faults[i] -= p->numa_faults_memory[i]; @@ -1691,7 +1692,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags, grp->nr_tasks++; spin_unlock(&my_grp->lock); - spin_unlock(&grp->lock); + spin_unlock_irq(&grp->lock); rcu_assign_pointer(p->numa_group, grp); @@ -1710,14 +1711,14 @@ void task_numa_free(struct task_struct *p) void *numa_faults = p->numa_faults_memory; if (grp) { - spin_lock(&grp->lock); + spin_lock_irq(&grp->lock); for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) grp->faults[i] -= p->numa_faults_memory[i]; grp->total_faults -= p->total_numa_faults; list_del(&p->numa_entry); grp->nr_tasks--; - spin_unlock(&grp->lock); + spin_unlock_irq(&grp->lock); rcu_assign_pointer(p->numa_group, NULL); put_numa_group(grp); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index c9007f2..456e492 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1385,6 +1385,15 @@ static inline void double_lock(spinlock_t *l1, spinlock_t *l2) spin_lock_nested(l2, SINGLE_DEPTH_NESTING); } +static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2) +{ + if (l1 > l2) + swap(l1, l2); + + spin_lock_irq(l1); + spin_lock_nested(l2, SINGLE_DEPTH_NESTING); +} + static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2) { if (l1 > l2) ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-04-14 7:23 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-02-28 6:23 [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads Mike Galbraith 2014-02-28 9:00 ` Pavel Vasilyev 2014-02-28 11:32 ` Peter Zijlstra 2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith 2014-04-06 19:17 ` Sasha Levin 2014-04-07 5:29 ` Mike Galbraith 2014-04-07 7:30 ` Mike Galbraith 2014-04-07 8:16 ` Peter Zijlstra 2014-04-07 8:40 ` Mike Galbraith 2014-04-07 8:55 ` Mike Galbraith 2014-04-13 20:53 ` Govindarajulu Varadarajan 2014-04-14 7:22 ` [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat tip-bot for Mike Galbraith
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.