All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads
@ 2014-02-28  6:23 Mike Galbraith
  2014-02-28  9:00 ` Pavel Vasilyev
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Mike Galbraith @ 2014-02-28  6:23 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Mel Gorman, LKML, RT


Bad idea:
[  908.026136]  [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0
[  908.026145]  [<ffffffff8108f701>] task_numa_free+0x31/0x130
[  908.026151]  [<ffffffff8108121e>] finish_task_switch+0xce/0x100
[  908.026156]  [<ffffffff81509c0a>] thread_return+0x48/0x4ae
[  908.026160]  [<ffffffff8150a095>] schedule+0x25/0xa0
[  908.026163]  [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0
[  908.026170]  [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680
[  908.026175]  [<ffffffff8100242d>] do_signal+0x3d/0x5b0
[  908.026179]  [<ffffffff81002a30>] do_notify_resume+0x90/0xe0
[  908.026186]  [<ffffffff81513176>] int_signal+0x12/0x17
[  908.026193]  [<00007ff2a388b1d0>] 0x7ff2a388b1cf

Signed-off-by: Mike Galbraith <bitbucket@online.de>

diff --git a/kernel/fork.c b/kernel/fork.c
index a17621c6cd42..332688e5e7b4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -237,6 +237,7 @@ void __put_task_struct(struct task_struct *tsk)
 	WARN_ON(atomic_read(&tsk->usage));
 	WARN_ON(tsk == current);
 
+	task_numa_free(tsk);
 	security_task_free(tsk);
 	exit_creds(tsk);
 	delayacct_tsk_free(tsk);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6edbef296ece..94a963fd040e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2149,8 +2149,6 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
 	if (mm)
 		mmdrop(mm);
 	if (unlikely(prev_state == TASK_DEAD)) {
-		task_numa_free(prev);
-
 		if (prev->sched_class->task_dead)
 			prev->sched_class->task_dead(prev);
 



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads
  2014-02-28  6:23 [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads Mike Galbraith
@ 2014-02-28  9:00 ` Pavel Vasilyev
  2014-02-28 11:32 ` Peter Zijlstra
  2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith
  2 siblings, 0 replies; 12+ messages in thread
From: Pavel Vasilyev @ 2014-02-28  9:00 UTC (permalink / raw)
  Cc: RT

28.02.2014 10:23, Mike Galbraith пишет:

Maybe

--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -237,6 +237,7 @@ void __put_task_struct(struct task_struct *tsk)
 	WARN_ON(atomic_read(&tsk->usage));
 	WARN_ON(tsk == current);
+#ifdef CONFIG_NUMA
+	task_numa_free(tsk);
+#endif
 	security_task_free(tsk);
 	exit_creds(tsk);
 	delayacct_tsk_free(tsk);
...
...
...

> 

-- 

                                                         Pavel.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads
  2014-02-28  6:23 [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads Mike Galbraith
  2014-02-28  9:00 ` Pavel Vasilyev
@ 2014-02-28 11:32 ` Peter Zijlstra
  2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith
  2 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2014-02-28 11:32 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Mel Gorman, LKML, RT

On Fri, Feb 28, 2014 at 07:23:11AM +0100, Mike Galbraith wrote:
> 
> Bad idea:
> [  908.026136]  [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0
> [  908.026145]  [<ffffffff8108f701>] task_numa_free+0x31/0x130
> [  908.026151]  [<ffffffff8108121e>] finish_task_switch+0xce/0x100
> [  908.026156]  [<ffffffff81509c0a>] thread_return+0x48/0x4ae
> [  908.026160]  [<ffffffff8150a095>] schedule+0x25/0xa0
> [  908.026163]  [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0
> [  908.026170]  [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680
> [  908.026175]  [<ffffffff8100242d>] do_signal+0x3d/0x5b0
> [  908.026179]  [<ffffffff81002a30>] do_notify_resume+0x90/0xe0
> [  908.026186]  [<ffffffff81513176>] int_signal+0x12/0x17
> [  908.026193]  [<00007ff2a388b1d0>] 0x7ff2a388b1cf
> 
> Signed-off-by: Mike Galbraith <bitbucket@online.de>

Fair enough; thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-02-28  6:23 [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads Mike Galbraith
  2014-02-28  9:00 ` Pavel Vasilyev
  2014-02-28 11:32 ` Peter Zijlstra
@ 2014-03-11 12:40 ` tip-bot for Mike Galbraith
  2014-04-06 19:17   ` Sasha Levin
  2 siblings, 1 reply; 12+ messages in thread
From: tip-bot for Mike Galbraith @ 2014-03-11 12:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, peterz, bitbucket, mgorman,
	akpm, tglx

Commit-ID:  156654f491dd8d52687a5fbe1637f472a52ce75b
Gitweb:     http://git.kernel.org/tip/156654f491dd8d52687a5fbe1637f472a52ce75b
Author:     Mike Galbraith <bitbucket@online.de>
AuthorDate: Fri, 28 Feb 2014 07:23:11 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 11 Mar 2014 12:05:43 +0100

sched/numa: Move task_numa_free() to __put_task_struct()

Bad idea on -rt:

[  908.026136]  [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0
[  908.026145]  [<ffffffff8108f701>] task_numa_free+0x31/0x130
[  908.026151]  [<ffffffff8108121e>] finish_task_switch+0xce/0x100
[  908.026156]  [<ffffffff81509c0a>] thread_return+0x48/0x4ae
[  908.026160]  [<ffffffff8150a095>] schedule+0x25/0xa0
[  908.026163]  [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0
[  908.026170]  [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680
[  908.026175]  [<ffffffff8100242d>] do_signal+0x3d/0x5b0
[  908.026179]  [<ffffffff81002a30>] do_notify_resume+0x90/0xe0
[  908.026186]  [<ffffffff81513176>] int_signal+0x12/0x17
[  908.026193]  [<00007ff2a388b1d0>] 0x7ff2a388b1cf

and since upstream does not mind where we do this, be a bit nicer ...

Signed-off-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1393568591.6018.27.camel@marge.simpson.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/fork.c       | 1 +
 kernel/sched/core.c | 2 --
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index a17621c..332688e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -237,6 +237,7 @@ void __put_task_struct(struct task_struct *tsk)
 	WARN_ON(atomic_read(&tsk->usage));
 	WARN_ON(tsk == current);
 
+	task_numa_free(tsk);
 	security_task_free(tsk);
 	exit_creds(tsk);
 	delayacct_tsk_free(tsk);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dd89c27..9e126a2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2151,8 +2151,6 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
 	if (mm)
 		mmdrop(mm);
 	if (unlikely(prev_state == TASK_DEAD)) {
-		task_numa_free(prev);
-
 		if (prev->sched_class->task_dead)
 			prev->sched_class->task_dead(prev);
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith
@ 2014-04-06 19:17   ` Sasha Levin
  2014-04-07  5:29     ` Mike Galbraith
  0 siblings, 1 reply; 12+ messages in thread
From: Sasha Levin @ 2014-04-06 19:17 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, torvalds, peterz, bitbucket, mgorman,
	akpm, tglx, linux-tip-commits
  Cc: Dave Jones

On 03/11/2014 08:40 AM, tip-bot for Mike Galbraith wrote:
> Commit-ID:  156654f491dd8d52687a5fbe1637f472a52ce75b
> Gitweb:     http://git.kernel.org/tip/156654f491dd8d52687a5fbe1637f472a52ce75b
> Author:     Mike Galbraith <bitbucket@online.de>
> AuthorDate: Fri, 28 Feb 2014 07:23:11 +0100
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Tue, 11 Mar 2014 12:05:43 +0100
> 
> sched/numa: Move task_numa_free() to __put_task_struct()
> 
> Bad idea on -rt:
> 
> [  908.026136]  [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0
> [  908.026145]  [<ffffffff8108f701>] task_numa_free+0x31/0x130
> [  908.026151]  [<ffffffff8108121e>] finish_task_switch+0xce/0x100
> [  908.026156]  [<ffffffff81509c0a>] thread_return+0x48/0x4ae
> [  908.026160]  [<ffffffff8150a095>] schedule+0x25/0xa0
> [  908.026163]  [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0
> [  908.026170]  [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680
> [  908.026175]  [<ffffffff8100242d>] do_signal+0x3d/0x5b0
> [  908.026179]  [<ffffffff81002a30>] do_notify_resume+0x90/0xe0
> [  908.026186]  [<ffffffff81513176>] int_signal+0x12/0x17
> [  908.026193]  [<00007ff2a388b1d0>] 0x7ff2a388b1cf
> 
> and since upstream does not mind where we do this, be a bit nicer ...
> 
> Signed-off-by: Mike Galbraith <bitbucket@online.de>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Mel Gorman <mgorman@suse.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Link: http://lkml.kernel.org/r/1393568591.6018.27.camel@marge.simpson.net
> Signed-off-by: Ingo Molnar <mingo@kernel.org>

As it seems, upstream does mind:

[ 2590.260734] ======================================================
[ 2590.261695] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
[ 2590.262748] 3.14.0-next-20140403-sasha-00022-g10224c0 #377 Tainted: G        W
[ 2590.263846] ------------------------------------------------------
[ 2590.264730] trinity-c244/1210 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 2590.265783] (&(&grp->lock)->rlock){+.+...}, at: task_numa_free (kernel/sched/fair.c:1714)
[ 2590.267179]
[ 2590.267179] and this task is already holding:
[ 2590.267996] (&(&new_timer->it_lock)->rlock){-.....}, at: exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998)
[ 2590.269381] which would create a new lock dependency:
[ 2590.270067]  (&(&new_timer->it_lock)->rlock){-.....} -> (&(&grp->lock)->rlock){+.+...}
[ 2590.270067]
[ 2590.270067] but this new dependency connects a HARDIRQ-irq-safe lock:
[ 2590.270067]  (&(&new_timer->it_lock)->rlock){-.....}
... which became HARDIRQ-irq-safe at:
[ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2783 kernel/locking/lockdep.c:3138)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
[ 2590.270067] posix_timer_fn (kernel/posix-timers.c:437)
[ 2590.270067] __run_hrtimer (kernel/hrtimer.c:1245 (discriminator 2))
[ 2590.270067] hrtimer_interrupt (kernel/hrtimer.c:1892)
[ 2590.270067] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:921)
[ 2590.270067] smp_apic_timer_interrupt (arch/x86/include/asm/apic.h:696 arch/x86/kernel/apic/apic.c:945)
[ 2590.270067] apic_timer_interrupt (arch/x86/kernel/entry_64.S:1164)
[ 2590.270067] default_idle (arch/x86/include/asm/paravirt.h:111 arch/x86/kernel/process.c:310)
[ 2590.270067] arch_cpu_idle (arch/x86/kernel/process.c:302)
[ 2590.270067] cpu_idle_loop (kernel/sched/idle.c:179 kernel/sched/idle.c:226)
[ 2590.270067] cpu_startup_entry (??:?)
[ 2590.270067] start_secondary (arch/x86/kernel/smpboot.c:267)
[ 2590.270067]
[ 2590.270067] to a HARDIRQ-irq-unsafe lock:
[ 2590.270067]  (&(&grp->lock)->rlock){+.+...}
... which became HARDIRQ-irq-unsafe at:
[ 2590.270067] ... __lock_acquire (kernel/locking/lockdep.c:2800 kernel/locking/lockdep.c:3138)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504)
[ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794)
[ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909)
[ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935)
[ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220)
[ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273)
[ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263)
[ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496)
[ 2590.270067]
[ 2590.270067] other info that might help us debug this:
[ 2590.270067]
[ 2590.270067]  Possible interrupt unsafe locking scenario:
[ 2590.270067]
[ 2590.270067]        CPU0                    CPU1
[ 2590.270067]        ----                    ----
[ 2590.270067]   lock(&(&grp->lock)->rlock);
[ 2590.270067]                                local_irq_disable();
[ 2590.270067]                                lock(&(&new_timer->it_lock)->rlock);
[ 2590.270067]                                lock(&(&grp->lock)->rlock);
[ 2590.270067]   <Interrupt>
[ 2590.270067]     lock(&(&new_timer->it_lock)->rlock);
[ 2590.270067]
[ 2590.270067]  *** DEADLOCK ***
[ 2590.270067]
[ 2590.270067] 1 lock held by trinity-c244/1210:
[ 2590.270067] #0: (&(&new_timer->it_lock)->rlock){-.....}, at: exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998)
[ 2590.270067]
the dependencies between HARDIRQ-irq-safe lock and the holding lock:
[ 2590.270067] -> (&(&new_timer->it_lock)->rlock){-.....} ops: 361 {
[ 2590.270067]    IN-HARDIRQ-W at:
[ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2783 kernel/locking/lockdep.c:3138)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
[ 2590.270067] posix_timer_fn (kernel/posix-timers.c:437)
[ 2590.270067] __run_hrtimer (kernel/hrtimer.c:1245 (discriminator 2))
[ 2590.270067] hrtimer_interrupt (kernel/hrtimer.c:1892)
[ 2590.270067] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:921)
[ 2590.270067] smp_apic_timer_interrupt (arch/x86/include/asm/apic.h:696 arch/x86/kernel/apic/apic.c:945)
[ 2590.270067] apic_timer_interrupt (arch/x86/kernel/entry_64.S:1164)
[ 2590.270067] default_idle (arch/x86/include/asm/paravirt.h:111 arch/x86/kernel/process.c:310)
[ 2590.270067] arch_cpu_idle (arch/x86/kernel/process.c:302)
[ 2590.270067] cpu_idle_loop (kernel/sched/idle.c:179 kernel/sched/idle.c:226)
[ 2590.270067] cpu_startup_entry (??:?)
[ 2590.270067] start_secondary (arch/x86/kernel/smpboot.c:267)
[ 2590.270067]    INITIAL USE at:
[ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:3142)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
[ 2590.270067] exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998)
[ 2590.270067] do_exit (kernel/exit.c:766)
[ 2590.270067] do_group_exit (kernel/exit.c:919)
[ 2590.270067] SyS_exit_group (kernel/exit.c:930)
[ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749)
[ 2590.270067]  }
[ 2590.270067] ... key at: __key.33130 (??:?)
[ 2590.270067]  ... acquired at:
[ 2590.270067] check_irq_usage (kernel/locking/lockdep.c:1638)
[ 2590.270067] __lock_acquire (kernel/locking/lockdep_states.h:7 kernel/locking/lockdep.c:1844 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 2590.270067] task_numa_free (kernel/sched/fair.c:1714)
[ 2590.270067] __put_task_struct (kernel/fork.c:244 (discriminator 2))
[ 2590.270067] posix_cpu_timer_del (include/linux/sched.h:1807 kernel/posix-cpu-timers.c:409)
[ 2590.270067] exit_itimers (kernel/posix-timers.c:973 kernel/posix-timers.c:998)
[ 2590.270067] do_exit (kernel/exit.c:766)
[ 2590.270067] do_group_exit (kernel/exit.c:919)
[ 2590.270067] SyS_exit_group (kernel/exit.c:930)
[ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749)
[ 2590.270067]
[ 2590.270067]
the dependencies between the lock to be acquired and HARDIRQ-irq-unsafe lock:
[ 2590.270067] -> (&(&grp->lock)->rlock){+.+...} ops: 91 {
[ 2590.270067]    HARDIRQ-ON-W at:
[ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2800 kernel/locking/lockdep.c:3138)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504)
[ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794)
[ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909)
[ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935)
[ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220)
[ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273)
[ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263)
[ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496)
[ 2590.270067]    SOFTIRQ-ON-W at:
[ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:2804 kernel/locking/lockdep.c:3138)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504)
[ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794)
[ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909)
[ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935)
[ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220)
[ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273)
[ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263)
[ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496)
[ 2590.270067]    INITIAL USE at:
[ 2590.270067] __lock_acquire (kernel/locking/lockdep.c:3142)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 2590.270067] task_numa_placement (include/linux/nodemask.h:248 kernel/sched/fair.c:1504)
[ 2590.270067] task_numa_fault (kernel/sched/fair.c:1794)
[ 2590.270067] __handle_mm_fault (mm/memory.c:3796 mm/memory.c:3796 mm/memory.c:3909)
[ 2590.270067] handle_mm_fault (include/linux/memcontrol.h:148 mm/memory.c:3935)
[ 2590.270067] __do_page_fault (arch/x86/mm/fault.c:1220)
[ 2590.270067] do_page_fault (arch/x86/mm/fault.c:1272 include/linux/jump_label.h:105 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1273)
[ 2590.270067] do_async_page_fault (arch/x86/kernel/kvm.c:263)
[ 2590.270067] async_page_fault (arch/x86/kernel/entry_64.S:1496)
[ 2590.270067]  }
[ 2590.270067] ... key at: __key.32449 (??:?)
[ 2590.270067]  ... acquired at:
[ 2590.270067] check_irq_usage (kernel/locking/lockdep.c:1638)
[ 2590.270067] __lock_acquire (kernel/locking/lockdep_states.h:7 kernel/locking/lockdep.c:1844 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 2590.270067] task_numa_free (kernel/sched/fair.c:1714)
[ 2590.270067] __put_task_struct (kernel/fork.c:244 (discriminator 2))
[ 2590.270067] posix_cpu_timer_del (include/linux/sched.h:1807 kernel/posix-cpu-timers.c:409)
[ 2590.270067] exit_itimers (kernel/posix-timers.c:973 kernel/posix-timers.c:998)
[ 2590.270067] do_exit (kernel/exit.c:766)
[ 2590.270067] do_group_exit (kernel/exit.c:919)
[ 2590.270067] SyS_exit_group (kernel/exit.c:930)
[ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749)
[ 2590.270067]
[ 2590.270067]
[ 2590.270067] stack backtrace:
[ 2590.270067] CPU: 3 PID: 1210 Comm: trinity-c244 Tainted: G        W     3.14.0-next-20140403-sasha-00022-g10224c0 #377
[ 2590.270067]  ffffffff87a83b60 ffff880081695ad8 ffffffff844bfb3f 0000000000000000
[ 2590.270067]  ffff880081698cf0 ffff880081695be8 ffffffff811c0d05 0000000000000000
[ 2590.270067]  ffffffff00000000 ffff880000000001 ffffffff8107aac5 ffff880081695b38
[ 2590.270067] Call Trace:
[ 2590.270067] dump_stack (lib/dump_stack.c:52)
[ 2590.270067] check_usage (kernel/locking/lockdep.c:1549 kernel/locking/lockdep.c:1580)
[ 2590.270067] ? sched_clock (arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:305)
[ 2590.270067] ? sched_clock_cpu (kernel/sched/clock.c:311)
[ 2590.270067] check_irq_usage (kernel/locking/lockdep.c:1638)
[ 2590.270067] __lock_acquire (kernel/locking/lockdep_states.h:7 kernel/locking/lockdep.c:1844 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
[ 2590.270067] ? __lock_acquire (kernel/locking/lockdep.c:3189)
[ 2590.270067] lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 2590.270067] ? task_numa_free (kernel/sched/fair.c:1714)
[ 2590.270067] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 2590.270067] ? task_numa_free (kernel/sched/fair.c:1714)
[ 2590.270067] ? trace_hardirqs_off_caller (kernel/locking/lockdep.c:2638 (discriminator 2))
[ 2590.270067] task_numa_free (kernel/sched/fair.c:1714)
[ 2590.270067] __put_task_struct (kernel/fork.c:244 (discriminator 2))
[ 2590.270067] posix_cpu_timer_del (include/linux/sched.h:1807 kernel/posix-cpu-timers.c:409)
[ 2590.270067] exit_itimers (kernel/posix-timers.c:973 kernel/posix-timers.c:998)
[ 2590.270067] do_exit (kernel/exit.c:766)
[ 2590.270067] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 2590.270067] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2557 kernel/locking/lockdep.c:2599)
[ 2590.270067] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
[ 2590.270067] do_group_exit (kernel/exit.c:919)
[ 2590.270067] SyS_exit_group (kernel/exit.c:930)
[ 2590.270067] tracesys (arch/x86/kernel/entry_64.S:749)


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-04-06 19:17   ` Sasha Levin
@ 2014-04-07  5:29     ` Mike Galbraith
  2014-04-07  7:30       ` Mike Galbraith
  0 siblings, 1 reply; 12+ messages in thread
From: Mike Galbraith @ 2014-04-07  5:29 UTC (permalink / raw)
  To: Sasha Levin
  Cc: mingo, hpa, linux-kernel, torvalds, peterz, mgorman, akpm, tglx,
	linux-tip-commits, Dave Jones

On Sun, 2014-04-06 at 15:17 -0400, Sasha Levin wrote: 
> On 03/11/2014 08:40 AM, tip-bot for Mike Galbraith wrote:
> > Commit-ID:  156654f491dd8d52687a5fbe1637f472a52ce75b
> > Gitweb:     http://git.kernel.org/tip/156654f491dd8d52687a5fbe1637f472a52ce75b
> > Author:     Mike Galbraith <bitbucket@online.de>
> > AuthorDate: Fri, 28 Feb 2014 07:23:11 +0100
> > Committer:  Ingo Molnar <mingo@kernel.org>
> > CommitDate: Tue, 11 Mar 2014 12:05:43 +0100
> > 
> > sched/numa: Move task_numa_free() to __put_task_struct()
> > 
> > Bad idea on -rt:
> > 
> > [  908.026136]  [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0
> > [  908.026145]  [<ffffffff8108f701>] task_numa_free+0x31/0x130
> > [  908.026151]  [<ffffffff8108121e>] finish_task_switch+0xce/0x100
> > [  908.026156]  [<ffffffff81509c0a>] thread_return+0x48/0x4ae
> > [  908.026160]  [<ffffffff8150a095>] schedule+0x25/0xa0
> > [  908.026163]  [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0
> > [  908.026170]  [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680
> > [  908.026175]  [<ffffffff8100242d>] do_signal+0x3d/0x5b0
> > [  908.026179]  [<ffffffff81002a30>] do_notify_resume+0x90/0xe0
> > [  908.026186]  [<ffffffff81513176>] int_signal+0x12/0x17
> > [  908.026193]  [<00007ff2a388b1d0>] 0x7ff2a388b1cf
> > 
> > and since upstream does not mind where we do this, be a bit nicer ...
> > 
> > Signed-off-by: Mike Galbraith <bitbucket@online.de>
> > Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> > Cc: Mel Gorman <mgorman@suse.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Link: http://lkml.kernel.org/r/1393568591.6018.27.camel@marge.simpson.net
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> 
> As it seems, upstream does mind:
> 
> [ 2590.260734] ======================================================
> [ 2590.261695] [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
> [ 2590.262748] 3.14.0-next-20140403-sasha-00022-g10224c0 #377 Tainted: G        W
> [ 2590.263846] ------------------------------------------------------
> [ 2590.264730] trinity-c244/1210 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> [ 2590.265783] (&(&grp->lock)->rlock){+.+...}, at: task_numa_free (kernel/sched/fair.c:1714)
> [ 2590.267179]
> [ 2590.267179] and this task is already holding:
> [ 2590.267996] (&(&new_timer->it_lock)->rlock){-.....}, at: exit_itimers (kernel/posix-timers.c:971 kernel/posix-timers.c:998)
> [ 2590.269381] which would create a new lock dependency:
> [ 2590.270067]  (&(&new_timer->it_lock)->rlock){-.....} -> (&(&grp->lock)->rlock){+.+...}

I'm not getting it.

I moved task_numa_free() from one interrupts enabled spot to another.
But, with numa=fake=4 and lockdep enabled, not only does lockdep gripe,
my little box locks up on splat.  Saying spin_lock/unlock_irq() did the
expected, just moved lockdep gripe to task_numa_fault().


> [ 2590.270067]  Possible interrupt unsafe locking scenario:
> [ 2590.270067]
> [ 2590.270067]        CPU0                    CPU1
> [ 2590.270067]        ----                    ----
> [ 2590.270067]   lock(&(&grp->lock)->rlock);
> [ 2590.270067]                                local_irq_disable();
> [ 2590.270067]                                lock(&(&new_timer->it_lock)->rlock);
> [ 2590.270067]                                lock(&(&grp->lock)->rlock);
> [ 2590.270067]   <Interrupt>
> [ 2590.270067]     lock(&(&new_timer->it_lock)->rlock);
> [ 2590.270067]
> [ 2590.270067]  *** DEADLOCK ***

Ok, so how did I manage that HARDIRQ-safe -> HARDIRQ-unsafe?

-Mike


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-04-07  5:29     ` Mike Galbraith
@ 2014-04-07  7:30       ` Mike Galbraith
  2014-04-07  8:16         ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Mike Galbraith @ 2014-04-07  7:30 UTC (permalink / raw)
  To: Sasha Levin
  Cc: mingo, hpa, linux-kernel, torvalds, peterz, mgorman, akpm, tglx,
	linux-tip-commits, Dave Jones

On Mon, 2014-04-07 at 07:29 +0200, Mike Galbraith wrote:

> I'm not getting it.
> 
> I moved task_numa_free() from one interrupts enabled spot to another.
> But, with numa=fake=4 and lockdep enabled, not only does lockdep gripe,
> my little box locks up on splat.  Saying spin_lock/unlock_irq() did the
> expected, just moved lockdep gripe to task_numa_fault().
> 
> 
> > [ 2590.270067]  Possible interrupt unsafe locking scenario:
> > [ 2590.270067]
> > [ 2590.270067]        CPU0                    CPU1
> > [ 2590.270067]        ----                    ----
> > [ 2590.270067]   lock(&(&grp->lock)->rlock);
> > [ 2590.270067]                                local_irq_disable();
> > [ 2590.270067]                                lock(&(&new_timer->it_lock)->rlock);
> > [ 2590.270067]                                lock(&(&grp->lock)->rlock);
> > [ 2590.270067]   <Interrupt>
> > [ 2590.270067]     lock(&(&new_timer->it_lock)->rlock);
> > [ 2590.270067]
> > [ 2590.270067]  *** DEADLOCK ***
> 
> Ok, so how did I manage that HARDIRQ-safe -> HARDIRQ-unsafe?

Think I'll turn lockdep off, and make context switches take a good long
while after finish_lock_switch(), but meanwhile, this made it happy.

Sasha reports that lockdep claims 156654f491dd8d52687a5fbe1637f472a52ce75b made
numa_group.lock interrupt unsafe.  While I don't see how that could be given the
commit in question moved task_numa_free() from one irq enabled region to another,
the below does make both gripes and lockup upon gripe with numa=fake=4 go away.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Not-signed-off-by: Mike Galbraith <bitbucket@online.de>
---
 kernel/sched/fair.c  |   12 +++++++-----
 kernel/sched/sched.h |    9 +++++++++
 2 files changed, 16 insertions(+), 5 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1497,7 +1497,7 @@ static void task_numa_placement(struct t
 	/* If the task is part of a group prevent parallel updates to group stats */
 	if (p->numa_group) {
 		group_lock = &p->numa_group->lock;
-		spin_lock(group_lock);
+		spin_lock_irq(group_lock);
 	}
 
 	/* Find the node with the highest number of faults */
@@ -1572,7 +1572,7 @@ static void task_numa_placement(struct t
 			}
 		}
 
-		spin_unlock(group_lock);
+		spin_unlock_irq(group_lock);
 	}
 
 	/* Preferred node as the node with the most faults */
@@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_
 	if (!join)
 		return;
 
-	double_lock(&my_grp->lock, &grp->lock);
+	BUG_ON(irqs_disabled());
+	double_lock_irq(&my_grp->lock, &grp->lock);
 
 	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
 		my_grp->faults[i] -= p->numa_faults_memory[i];
@@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_
 
 	spin_unlock(&my_grp->lock);
 	spin_unlock(&grp->lock);
+	local_irq_enable();
 
 	rcu_assign_pointer(p->numa_group, grp);
 
@@ -1710,14 +1712,14 @@ void task_numa_free(struct task_struct *
 	void *numa_faults = p->numa_faults_memory;
 
 	if (grp) {
-		spin_lock(&grp->lock);
+		spin_lock_irq(&grp->lock);
 		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
 			grp->faults[i] -= p->numa_faults_memory[i];
 		grp->total_faults -= p->total_numa_faults;
 
 		list_del(&p->numa_entry);
 		grp->nr_tasks--;
-		spin_unlock(&grp->lock);
+		spin_unlock_irq(&grp->lock);
 		rcu_assign_pointer(p->numa_group, NULL);
 		put_numa_group(grp);
 	}
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1388,6 +1388,15 @@ static inline void double_lock(spinlock_
 	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
 }
 
+static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2)
+{
+	if (l1 > l2)
+		swap(l1, l2);
+
+	spin_lock_irq(l1);
+	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
+}
+
 static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2)
 {
 	if (l1 > l2)



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-04-07  7:30       ` Mike Galbraith
@ 2014-04-07  8:16         ` Peter Zijlstra
  2014-04-07  8:40           ` Mike Galbraith
  2014-04-07  8:55           ` Mike Galbraith
  0 siblings, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2014-04-07  8:16 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sasha Levin, mingo, hpa, linux-kernel, torvalds, mgorman, akpm,
	tglx, linux-tip-commits, Dave Jones

On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote:
> -	double_lock(&my_grp->lock, &grp->lock);
> +	BUG_ON(irqs_disabled());
> +	double_lock_irq(&my_grp->lock, &grp->lock);

So either make this:

	local_irq_disable();
	double_lock();

or

>  
>  	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
>  		my_grp->faults[i] -= p->numa_faults_memory[i];
> @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_
>  
>  	spin_unlock(&my_grp->lock);
>  	spin_unlock(&grp->lock);
> +	local_irq_enable();

use:
	spin_unlock()
	spin_unlock_irq()

or so, but this imbalance is making my itch :-)


>  
>  	rcu_assign_pointer(p->numa_group, grp);
>  

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-04-07  8:16         ` Peter Zijlstra
@ 2014-04-07  8:40           ` Mike Galbraith
  2014-04-07  8:55           ` Mike Galbraith
  1 sibling, 0 replies; 12+ messages in thread
From: Mike Galbraith @ 2014-04-07  8:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sasha Levin, mingo, hpa, linux-kernel, torvalds, mgorman, akpm,
	tglx, linux-tip-commits, Dave Jones

On Mon, 2014-04-07 at 10:16 +0200, Peter Zijlstra wrote: 
> On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote:
> > -	double_lock(&my_grp->lock, &grp->lock);
> > +	BUG_ON(irqs_disabled());
> > +	double_lock_irq(&my_grp->lock, &grp->lock);
> 
> So either make this:
> 
> 	local_irq_disable();
> 	double_lock();
> 
> or
> 
> >  
> >  	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
> >  		my_grp->faults[i] -= p->numa_faults_memory[i];
> > @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_
> >  
> >  	spin_unlock(&my_grp->lock);
> >  	spin_unlock(&grp->lock);
> > +	local_irq_enable();
> 
> use:
> 	spin_unlock()
> 	spin_unlock_irq()

*thwap*  Well duh.

> or so, but this imbalance is making my itch :-)

Yeah, much better.

Before I actually sign that off, mind cluing me in as to why I should
not be sitting here thinking lockdep smoked its breakfast?

-Mike


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-04-07  8:16         ` Peter Zijlstra
  2014-04-07  8:40           ` Mike Galbraith
@ 2014-04-07  8:55           ` Mike Galbraith
  2014-04-13 20:53             ` Govindarajulu Varadarajan
  2014-04-14  7:22             ` [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat tip-bot for Mike Galbraith
  1 sibling, 2 replies; 12+ messages in thread
From: Mike Galbraith @ 2014-04-07  8:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sasha Levin, mingo, hpa, linux-kernel, torvalds, mgorman, akpm,
	tglx, linux-tip-commits, Dave Jones

On Mon, 2014-04-07 at 10:16 +0200, Peter Zijlstra wrote: 
> On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote:
> > -	double_lock(&my_grp->lock, &grp->lock);
> > +	BUG_ON(irqs_disabled());
> > +	double_lock_irq(&my_grp->lock, &grp->lock);
> 
> So either make this:
> 
> 	local_irq_disable();
> 	double_lock();
> 
> or
> 
> >  
> >  	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
> >  		my_grp->faults[i] -= p->numa_faults_memory[i];
> > @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_
> >  
> >  	spin_unlock(&my_grp->lock);
> >  	spin_unlock(&grp->lock);
> > +	local_irq_enable();
> 
> use:
> 	spin_unlock()
> 	spin_unlock_irq()
> 
> or so, but this imbalance is making my itch :-)

sched, numa: fix task_numa_free() lockdep splat

Sasha reports that lockdep claims 156654f491dd8d52687a5fbe1637f472a52ce75b made
numa_group.lock interrupt unsafe.  While I don't see how that could be given the
commit in question moved task_numa_free() from one irq enabled region to another,
the below does make both gripes and lockups upon gripe with numa=fake=4 go away.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Mike Galbraith <bitbucket@online.de>
---
 kernel/sched/fair.c  |   13 +++++++------
 kernel/sched/sched.h |    9 +++++++++
 2 files changed, 16 insertions(+), 6 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1497,7 +1497,7 @@ static void task_numa_placement(struct t
 	/* If the task is part of a group prevent parallel updates to group stats */
 	if (p->numa_group) {
 		group_lock = &p->numa_group->lock;
-		spin_lock(group_lock);
+		spin_lock_irq(group_lock);
 	}
 
 	/* Find the node with the highest number of faults */
@@ -1572,7 +1572,7 @@ static void task_numa_placement(struct t
 			}
 		}
 
-		spin_unlock(group_lock);
+		spin_unlock_irq(group_lock);
 	}
 
 	/* Preferred node as the node with the most faults */
@@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_
 	if (!join)
 		return;
 
-	double_lock(&my_grp->lock, &grp->lock);
+	BUG_ON(irqs_disabled());
+	double_lock_irq(&my_grp->lock, &grp->lock);
 
 	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
 		my_grp->faults[i] -= p->numa_faults_memory[i];
@@ -1691,7 +1692,7 @@ static void task_numa_group(struct task_
 	grp->nr_tasks++;
 
 	spin_unlock(&my_grp->lock);
-	spin_unlock(&grp->lock);
+	spin_unlock_irq(&grp->lock);
 
 	rcu_assign_pointer(p->numa_group, grp);
 
@@ -1710,14 +1711,14 @@ void task_numa_free(struct task_struct *
 	void *numa_faults = p->numa_faults_memory;
 
 	if (grp) {
-		spin_lock(&grp->lock);
+		spin_lock_irq(&grp->lock);
 		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
 			grp->faults[i] -= p->numa_faults_memory[i];
 		grp->total_faults -= p->total_numa_faults;
 
 		list_del(&p->numa_entry);
 		grp->nr_tasks--;
-		spin_unlock(&grp->lock);
+		spin_unlock_irq(&grp->lock);
 		rcu_assign_pointer(p->numa_group, NULL);
 		put_numa_group(grp);
 	}
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1388,6 +1388,15 @@ static inline void double_lock(spinlock_
 	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
 }
 
+static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2)
+{
+	if (l1 > l2)
+		swap(l1, l2);
+
+	spin_lock_irq(l1);
+	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
+}
+
 static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2)
 {
 	if (l1 > l2)



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct()
  2014-04-07  8:55           ` Mike Galbraith
@ 2014-04-13 20:53             ` Govindarajulu Varadarajan
  2014-04-14  7:22             ` [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat tip-bot for Mike Galbraith
  1 sibling, 0 replies; 12+ messages in thread
From: Govindarajulu Varadarajan @ 2014-04-13 20:53 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Peter Zijlstra, Sasha Levin, mingo, hpa, linux-kernel, torvalds,
	mgorman, akpm, tglx, linux-tip-commits, Dave Jones



On Mon, 7 Apr 2014, Mike Galbraith wrote:

> On Mon, 2014-04-07 at 10:16 +0200, Peter Zijlstra wrote:
>> On Mon, Apr 07, 2014 at 09:30:30AM +0200, Mike Galbraith wrote:
>>> -	double_lock(&my_grp->lock, &grp->lock);
>>> +	BUG_ON(irqs_disabled());
>>> +	double_lock_irq(&my_grp->lock, &grp->lock);
>>
>> So either make this:
>>
>> 	local_irq_disable();
>> 	double_lock();
>>
>> or
>>
>>>
>>>  	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
>>>  		my_grp->faults[i] -= p->numa_faults_memory[i];
>>> @@ -1692,6 +1693,7 @@ static void task_numa_group(struct task_
>>>
>>>  	spin_unlock(&my_grp->lock);
>>>  	spin_unlock(&grp->lock);
>>> +	local_irq_enable();
>>
>> use:
>> 	spin_unlock()
>> 	spin_unlock_irq()
>>
>> or so, but this imbalance is making my itch :-)
>
> sched, numa: fix task_numa_free() lockdep splat
>
> Sasha reports that lockdep claims 156654f491dd8d52687a5fbe1637f472a52ce75b made
> numa_group.lock interrupt unsafe.  While I don't see how that could be given the
> commit in question moved task_numa_free() from one irq enabled region to another,
> the below does make both gripes and lockups upon gripe with numa=fake=4 go away.
>

Hi

I Am hitting this bug quite frequently. I do not see the problem after applying
this patch.

Thanks

Tested-by: Govindarajulu Varadarajan <govindx7c6@gmail.com>

> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Signed-off-by: Mike Galbraith <bitbucket@online.de>
> ---
> kernel/sched/fair.c  |   13 +++++++------
> kernel/sched/sched.h |    9 +++++++++
> 2 files changed, 16 insertions(+), 6 deletions(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1497,7 +1497,7 @@ static void task_numa_placement(struct t
> 	/* If the task is part of a group prevent parallel updates to group stats */
> 	if (p->numa_group) {
> 		group_lock = &p->numa_group->lock;
> -		spin_lock(group_lock);
> +		spin_lock_irq(group_lock);
> 	}
>
> 	/* Find the node with the highest number of faults */
> @@ -1572,7 +1572,7 @@ static void task_numa_placement(struct t
> 			}
> 		}
>
> -		spin_unlock(group_lock);
> +		spin_unlock_irq(group_lock);
> 	}
>
> 	/* Preferred node as the node with the most faults */
> @@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_
> 	if (!join)
> 		return;
>
> -	double_lock(&my_grp->lock, &grp->lock);
> +	BUG_ON(irqs_disabled());
> +	double_lock_irq(&my_grp->lock, &grp->lock);
>
> 	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
> 		my_grp->faults[i] -= p->numa_faults_memory[i];
> @@ -1691,7 +1692,7 @@ static void task_numa_group(struct task_
> 	grp->nr_tasks++;
>
> 	spin_unlock(&my_grp->lock);
> -	spin_unlock(&grp->lock);
> +	spin_unlock_irq(&grp->lock);
>
> 	rcu_assign_pointer(p->numa_group, grp);
>
> @@ -1710,14 +1711,14 @@ void task_numa_free(struct task_struct *
> 	void *numa_faults = p->numa_faults_memory;
>
> 	if (grp) {
> -		spin_lock(&grp->lock);
> +		spin_lock_irq(&grp->lock);
> 		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
> 			grp->faults[i] -= p->numa_faults_memory[i];
> 		grp->total_faults -= p->total_numa_faults;
>
> 		list_del(&p->numa_entry);
> 		grp->nr_tasks--;
> -		spin_unlock(&grp->lock);
> +		spin_unlock_irq(&grp->lock);
> 		rcu_assign_pointer(p->numa_group, NULL);
> 		put_numa_group(grp);
> 	}
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1388,6 +1388,15 @@ static inline void double_lock(spinlock_
> 	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
> }
>
> +static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2)
> +{
> +	if (l1 > l2)
> +		swap(l1, l2);
> +
> +	spin_lock_irq(l1);
> +	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
> +}
> +
> static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2)
> {
> 	if (l1 > l2)
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat
  2014-04-07  8:55           ` Mike Galbraith
  2014-04-13 20:53             ` Govindarajulu Varadarajan
@ 2014-04-14  7:22             ` tip-bot for Mike Galbraith
  1 sibling, 0 replies; 12+ messages in thread
From: tip-bot for Mike Galbraith @ 2014-04-14  7:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, sasha.levin, hpa, mingo, peterz, bitbucket, davej, tglx

Commit-ID:  60e69eed85bb7b5198ef70643b5895c26ad76ef7
Gitweb:     http://git.kernel.org/tip/60e69eed85bb7b5198ef70643b5895c26ad76ef7
Author:     Mike Galbraith <bitbucket@online.de>
AuthorDate: Mon, 7 Apr 2014 10:55:15 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 11 Apr 2014 10:39:15 +0200

sched/numa: Fix task_numa_free() lockdep splat

Sasha reported that lockdep claims that the following commit:
made numa_group.lock interrupt unsafe:

  156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()")

While I don't see how that could be, given the commit in question moved
task_numa_free() from one irq enabled region to another, the below does
make both gripes and lockups upon gripe with numa=fake=4 go away.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Fixes: 156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()")
Signed-off-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: torvalds@linux-foundation.org
Cc: mgorman@suse.com
Cc: akpm@linux-foundation.org
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/1396860915.5170.5.camel@marge.simpson.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c  | 13 +++++++------
 kernel/sched/sched.h |  9 +++++++++
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7e9bd0b..4f14a65 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1497,7 +1497,7 @@ static void task_numa_placement(struct task_struct *p)
 	/* If the task is part of a group prevent parallel updates to group stats */
 	if (p->numa_group) {
 		group_lock = &p->numa_group->lock;
-		spin_lock(group_lock);
+		spin_lock_irq(group_lock);
 	}
 
 	/* Find the node with the highest number of faults */
@@ -1572,7 +1572,7 @@ static void task_numa_placement(struct task_struct *p)
 			}
 		}
 
-		spin_unlock(group_lock);
+		spin_unlock_irq(group_lock);
 	}
 
 	/* Preferred node as the node with the most faults */
@@ -1677,7 +1677,8 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
 	if (!join)
 		return;
 
-	double_lock(&my_grp->lock, &grp->lock);
+	BUG_ON(irqs_disabled());
+	double_lock_irq(&my_grp->lock, &grp->lock);
 
 	for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) {
 		my_grp->faults[i] -= p->numa_faults_memory[i];
@@ -1691,7 +1692,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
 	grp->nr_tasks++;
 
 	spin_unlock(&my_grp->lock);
-	spin_unlock(&grp->lock);
+	spin_unlock_irq(&grp->lock);
 
 	rcu_assign_pointer(p->numa_group, grp);
 
@@ -1710,14 +1711,14 @@ void task_numa_free(struct task_struct *p)
 	void *numa_faults = p->numa_faults_memory;
 
 	if (grp) {
-		spin_lock(&grp->lock);
+		spin_lock_irq(&grp->lock);
 		for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++)
 			grp->faults[i] -= p->numa_faults_memory[i];
 		grp->total_faults -= p->total_numa_faults;
 
 		list_del(&p->numa_entry);
 		grp->nr_tasks--;
-		spin_unlock(&grp->lock);
+		spin_unlock_irq(&grp->lock);
 		rcu_assign_pointer(p->numa_group, NULL);
 		put_numa_group(grp);
 	}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c9007f2..456e492 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1385,6 +1385,15 @@ static inline void double_lock(spinlock_t *l1, spinlock_t *l2)
 	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
 }
 
+static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2)
+{
+	if (l1 > l2)
+		swap(l1, l2);
+
+	spin_lock_irq(l1);
+	spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
+}
+
 static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2)
 {
 	if (l1 > l2)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-04-14  7:23 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-28  6:23 [patch] rt,sched,numa: Move task_numa_free() to __put_task_struct(), which -rt offloads Mike Galbraith
2014-02-28  9:00 ` Pavel Vasilyev
2014-02-28 11:32 ` Peter Zijlstra
2014-03-11 12:40 ` [tip:sched/core] sched/numa: Move task_numa_free() to __put_task_struct() tip-bot for Mike Galbraith
2014-04-06 19:17   ` Sasha Levin
2014-04-07  5:29     ` Mike Galbraith
2014-04-07  7:30       ` Mike Galbraith
2014-04-07  8:16         ` Peter Zijlstra
2014-04-07  8:40           ` Mike Galbraith
2014-04-07  8:55           ` Mike Galbraith
2014-04-13 20:53             ` Govindarajulu Varadarajan
2014-04-14  7:22             ` [tip:sched/urgent] sched/numa: Fix task_numa_free() lockdep splat tip-bot for Mike Galbraith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.