linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sasha.levin@oracle.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Kirill Tkhai <ktkhai@parallels.com>,
	linux-kernel@vger.kernel.org, Oleg Nesterov <oleg@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	Vladimir Davydov <vdavydov@parallels.com>,
	Kirill Tkhai <tkhai@yandex.ru>
Subject: Re: [PATCH v4] sched/numa: fix unsafe get_task_struct() in task_numa_assign()
Date: Tue, 18 Nov 2014 12:30:31 -0500	[thread overview]
Message-ID: <546B8237.5000107@oracle.com> (raw)
In-Reply-To: <5466BCAF.4020706@oracle.com>

On 11/14/2014 09:38 PM, Sasha Levin wrote:
> On 11/10/2014 11:03 AM, Peter Zijlstra wrote:
>> > On Fri, Nov 07, 2014 at 10:48:27PM -0500, Sasha Levin wrote:
>>>> >> > [  829.539183] BUG: spinlock recursion on CPU#10, trinity-c594/11067
>>>> >> > [  829.539203]  lock: 0xffff880631dd6b80, .magic: dead4ead, .owner: trinity-c594/11067, .owner_cpu: 13
>> > Ooh, look at that. CPU#10 vs .owner_cpu: 13 on the _same_ task.
>> > 
>> > One of those again :/
> This actually reproduces pretty easily. The stack trace seems to be different
> but the end result is the same as above. Anything we can do to debug it? I'm
> really not sure how just the owner_cpu can be different here.

I've added saving stack traces on raw spinlock lock/unlock. Had to keep it at
8 entries, but it's enough to get the gist of it. The trace I'm seeing is:

[ 1239.788220] BUG: spinlock recursion on CPU#0, trinity-c767/32076
[ 1239.788270] irq event stamp: 9135954
[ 1239.788455] hardirqs last enabled at (9135953): free_pages_prepare (mm/page_alloc.c:770)
[ 1239.788573] hardirqs last disabled at (9135954): _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:109 kernel/locking/spinlock.c:159)
[ 1239.788687] softirqs last enabled at (9134166): __do_softirq (./arch/x86/include/asm/preempt.h:22 kernel/softirq.c:296)
[ 1239.788796] softirqs last disabled at (9134161): irq_exit (kernel/softirq.c:346 kernel/softirq.c:387)
[ 1239.791152]  lock: 0xffff8805c3dd9100, .magic: dead4ead, .owner: trinity-c767/32076, .owner_cpu: -1
[ 1239.791152] lock trace:
[ 1239.791152] save_stack_trace (arch/x86/kernel/stacktrace.c:64)
[ 1239.791152] do_raw_spin_trylock (kernel/locking/spinlock_debug.c:171)
[ 1239.791152] _raw_spin_lock_irq (include/linux/spinlock_api_smp.h:129 kernel/locking/spinlock.c:167)
[ 1239.791152] __schedule (kernel/sched/core.c:2803)
[ 1239.791152] schedule (kernel/sched/core.c:2870)
[ 1239.791152] p9_client_rpc (net/9p/client.c:756 (discriminator 13))
[ 1239.791152] p9_client_read (net/9p/client.c:1582)
[ 1239.791152] v9fs_fid_readn (fs/9p/vfs_file.c:386)
[ 1239.791152] unlock trace:
[ 1239.791152] save_stack_trace (arch/x86/kernel/stacktrace.c:64)
[ 1239.791152] do_raw_spin_unlock (kernel/locking/spinlock_debug.c:120 include/linux/jump_label.h:114 ./arch/x86/include/asm/spinlock.h:149 kernel/locking/spinlock_debug.c:176)
[ 1239.791152] _raw_spin_unlock_irq (include/linux/spinlock_api_smp.h:168 kernel/locking/spinlock.c:199)
[ 1239.791152] finish_task_switch (./arch/x86/include/asm/current.h:14 kernel/sched/core.c:2251)
[ 1239.791152] __schedule (kernel/sched/core.c:2358 kernel/sched/core.c:2840)
[ 1239.791152] schedule_preempt_disabled (kernel/sched/core.c:2897)
[ 1239.791152] cpu_startup_entry (kernel/sched/idle.c:252 kernel/sched/idle.c:274)
[ 1239.791152] start_secondary (arch/x86/kernel/smpboot.c:238)
[ 1239.791152] CPU: 0 PID: 32076 Comm: trinity-c767 Not tainted 3.18.0-rc4-next-20141117-sasha-00046-g481e3e8-dirty #1471
[ 1239.791152]  0000000000000000 0000000000000000 ffff88089a90b000 ffff880867223728
[ 1239.791152]  ffffffff920d1ec1 0000000000000030 ffff8805c3dd9100 ffff880867223768
[ 1239.791152]  ffffffff815d0617 ffffffff9eec7730 ffff88089a90bf58 ffff8805c3dd9100
[ 1239.791152] Call Trace:
[ 1239.791152] dump_stack (lib/dump_stack.c:52)
[ 1239.791152] spin_dump (kernel/locking/spinlock_debug.c:81 (discriminator 8))
[ 1239.791152] spin_bug (kernel/locking/spinlock_debug.c:89)
[ 1239.791152] do_raw_spin_lock (kernel/locking/spinlock_debug.c:97 kernel/locking/spinlock_debug.c:152)
[ 1239.791152] ? load_balance (kernel/sched/fair.c:5582 kernel/sched/fair.c:6884)
[ 1239.791152] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:117 kernel/locking/spinlock.c:159)
[ 1239.791152] ? load_balance (kernel/sched/fair.c:5582 kernel/sched/fair.c:6884)
[ 1239.791152] load_balance (kernel/sched/fair.c:5582 kernel/sched/fair.c:6884)
[ 1239.791152] pick_next_task_fair (kernel/sched/fair.c:7154 kernel/sched/fair.c:5093)
[ 1239.791152] ? pick_next_task_fair (kernel/sched/fair.c:7131 kernel/sched/fair.c:5093)
[ 1239.791152] __schedule (kernel/sched/core.c:2716 kernel/sched/core.c:2830)
[ 1239.791152] ? preempt_count_sub (kernel/sched/core.c:2644)
[ 1239.791152] schedule (kernel/sched/core.c:2870)
[ 1239.791152] p9_client_rpc (net/9p/client.c:756 (discriminator 13))
[ 1239.791152] ? __init_waitqueue_head (kernel/sched/wait.c:292)
[ 1239.791152] ? p9_idpool_put (net/9p/util.c:127)
[ 1239.791152] p9_client_read (net/9p/client.c:1582)
[ 1239.791152] ? p9_free_req (net/9p/client.c:410)
[ 1239.791152] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
[ 1239.791152] ? might_fault (mm/memory.c:3734)
[ 1239.791152] v9fs_fid_readn (fs/9p/vfs_file.c:386)
[ 1239.791152] v9fs_file_read (fs/9p/vfs_file.c:447)
[ 1239.791152] do_loop_readv_writev (fs/read_write.c:724)
[ 1239.791152] ? v9fs_fid_readn (fs/9p/vfs_file.c:433)
[ 1239.791152] ? v9fs_fid_readn (fs/9p/vfs_file.c:433)
[ 1239.791152] do_readv_writev (fs/read_write.c:856)
[ 1239.791152] ? save_stack_trace (arch/x86/kernel/stacktrace.c:64)
[ 1239.791152] vfs_readv (fs/read_write.c:882)
[ 1239.791152] SyS_readv (fs/read_write.c:908 fs/read_write.c:899)
[ 1239.791152] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)

So I still can't explain why owner_cpu is "-1" rather than something that makes sense,
but the trace *kinda* makes sense.

We locked the rq once in __schedule(), proceeded to finding the same rq as the busiest
one in load_balance() and locking it yet again.

The odd thing here is that load_balance() has a check just for that:

	BUG_ON(busiest == env.dst_rq);

But that doesn't get hit, although we lock it only a bit later:

	more_balance:
                raw_spin_lock_irqsave(&busiest->lock, flags);

And there's no check before the actual lock there, so I've added another BUG_ON
there and will update with results.


Thanks,
Sasha

      reply	other threads:[~2014-11-18 17:31 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-22  7:17 [PATCH v4] sched/numa: fix unsafe get_task_struct() in task_numa_assign() Kirill Tkhai
2014-10-22 21:30 ` introduce task_rcu_dereference? Oleg Nesterov
2014-10-22 22:23   ` Oleg Nesterov
2014-10-23 18:15     ` Oleg Nesterov
2014-10-23  8:10   ` Kirill Tkhai
2014-10-23 18:18     ` Oleg Nesterov
2014-10-24  7:51       ` Kirill Tkhai
2014-10-27 19:53 ` [PATCH 0/3] introduce task_rcu_dereference() Oleg Nesterov
2014-10-27 19:54   ` [PATCH 1/3] probe_kernel_address() can use __probe_kernel_read() Oleg Nesterov
2014-10-27 19:54   ` [PATCH 2/3] introduce probe_slab_address() Oleg Nesterov
2014-10-27 19:21     ` Christoph Lameter
2014-10-28  5:44     ` Kirill Tkhai
2014-10-28  5:48       ` Kirill Tkhai
2014-10-28 15:01       ` Peter Zijlstra
2014-10-28 17:56         ` Kirill Tkhai
2014-10-28 18:00           ` Kirill Tkhai
2014-10-28 19:55           ` Oleg Nesterov
2014-10-28 20:12             ` Oleg Nesterov
2014-10-29  5:10               ` Kirill Tkhai
2014-10-27 19:54   ` [PATCH 3/3] introduce task_rcu_dereference() Oleg Nesterov
2014-10-28  6:22     ` Kirill Tkhai
2016-05-18 17:02     ` Peter Zijlstra
2016-05-18 18:23       ` Oleg Nesterov
2016-05-18 19:10         ` Peter Zijlstra
2016-05-18 19:57           ` Oleg Nesterov
2016-05-26 11:34             ` Peter Zijlstra
2016-06-03 10:49             ` [tip:sched/core] sched/fair: Use task_rcu_dereference() tip-bot for Oleg Nesterov
2016-06-03 10:48       ` [tip:sched/core] sched/api: Introduce task_rcu_dereference() and try_get_task_struct() tip-bot for Oleg Nesterov
2014-10-28 11:02 ` [tip:sched/core] sched/numa: Fix unsafe get_task_struct() in task_numa_assign() tip-bot for Kirill Tkhai
2014-11-08  3:48 ` [PATCH v4] sched/numa: fix " Sasha Levin
2014-11-09 14:07   ` Kirill Tkhai
2014-11-10 10:03     ` Peter Zijlstra
2014-11-10 15:48       ` Sasha Levin
2014-11-10 16:01         ` Peter Zijlstra
2014-11-16  9:50       ` [tip:sched/urgent] sched/numa: Avoid selecting oneself as swap target tip-bot for Peter Zijlstra
2014-11-10 16:03   ` [PATCH v4] sched/numa: fix unsafe get_task_struct() in task_numa_assign() Peter Zijlstra
2014-11-10 16:09     ` Sasha Levin
2014-11-10 16:16       ` Peter Zijlstra
2014-11-10 16:10     ` Kirill Tkhai
2014-11-10 16:36       ` Kirill Tkhai
2014-11-10 16:44         ` Sasha Levin
2014-11-10 20:01           ` Kirill Tkhai
2014-11-12  9:49             ` Kirill Tkhai
2014-11-15  2:38     ` Sasha Levin
2014-11-18 17:30       ` Sasha Levin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=546B8237.5000107@oracle.com \
    --to=sasha.levin@oracle.com \
    --cc=ktkhai@parallels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tkhai@yandex.ru \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).