linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael wang <wangyun@linux.vnet.ibm.com>
To: Sasha Levin <sasha.levin@oracle.com>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: sched: hang in migrate_swap
Date: Thu, 20 Feb 2014 12:32:47 +0800	[thread overview]
Message-ID: <5305856F.3000109@linux.vnet.ibm.com> (raw)
In-Reply-To: <5304F32A.4040907@oracle.com>

On 02/20/2014 02:08 AM, Sasha Levin wrote:
> Hi all,
> 
> While fuzzing with trinity inside a KVM tools guest, running latest
> -next kernel, I see to hit the following hang quite often.

Fix for the stuck issue around idle_balance() is now in progress, this
may caused be the same problem, I suggest we do some retest after these
patch got merged.

Regards,
Michael Wang

> 
> The initial spew is:
> 
> [  293.110057] BUG: soft lockup - CPU#8 stuck for 22s! [migration/8:258]
> [  293.110057] Modules linked in:
> [  293.110057] irq event stamp: 20828
> [  293.110057] hardirqs last  enabled at (20827): [<ffffffff84385437>]
> restore_args+0x0/0x30
> [  293.110057] hardirqs last disabled at (20828): [<ffffffff8438f02d>]
> apic_timer_interrupt+0x6d/0x80
> [  293.110057] softirqs last  enabled at (20826): [<ffffffff811418e7>]
> __do_softirq+0x447/0x4f0
> [  293.110057] softirqs last disabled at (20821): [<ffffffff81141fa3>]
> irq_exit+0x83/0x160
> [  293.110057] CPU: 8 PID: 258 Comm: migration/8 Tainted: G        W
> 3.14.0-rc3-next-20140218-sasha-00008-g16140ff-dirty #109
> [  293.110057] task: ffff8805b8283000 ti: ffff8805b82c0000 task.ti:
> ffff8805b82c0000
> [  293.110057] RIP: 0010:[<ffffffff81203892>]  [<ffffffff81203892>]
> multi_cpu_stop+0xc2/0x1c0
> [  293.110057] RSP: 0000:ffff8805b82c1cd8  EFLAGS: 00000293
> [  293.110057] RAX: 0000000000000001 RBX: ffffffff84385437 RCX:
> 0000000000000000
> [  293.110057] RDX: 0000000000000001 RSI: 0000000000000001 RDI:
> ffff8800c40bb978
> [  293.110057] RBP: ffff8805b82c1d18 R08: 0000000000000000 R09:
> 0000000000000000
> [  293.110057] R10: 0000000000000001 R11: 0000000000000001 R12:
> ffff8805b82c1c48
> [  293.110057] R13: ffff8805b8283000 R14: ffff8805b82c0000 R15:
> ffff8805b8283000
> [  293.110057] FS:  0000000000000000(0000) GS:ffff8805b9000000(0000)
> knlGS:0000000000000000
> [  293.110057] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  293.110057] CR2: 0000000000942d58 CR3: 0000000005c26000 CR4:
> 00000000000006e0
> [  293.110057] DR0: 000000000089e000 DR1: 00007f938d51a000 DR2:
> 0000000000000000
> [  293.110057] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000600
> [  293.110057] Stack:
> [  293.110057]  ffff8805b91d0be0 00000000c40bb9d8 ffff8805b82c1cf8
> ffff8805b91d0be0
> [  293.110057]  ffff8800c40bba28 ffffffff812037d0 ffff8800c40bb978
> ffff8805b91d0c30
> [  293.110057]  ffff8805b82c1de8 ffffffff81202f56 0000000000000001
> ffffffff81202e44
> [  293.110057] Call Trace:
> [  293.110057]  [<ffffffff812037d0>] ? stop_cpus+0x60/0x60
> [  293.110057]  [<ffffffff81202f56>] cpu_stopper_thread+0x96/0x190
> [  293.110057]  [<ffffffff81202e44>] ? cpu_stop_should_run+0x44/0x60
> [  293.110057]  [<ffffffff811a3d1a>] ? __lock_release+0x1da/0x1f0
> [  293.110057]  [<ffffffff84384ec5>] ?
> _raw_spin_unlock_irqrestore+0x65/0xb0
> [  293.110057]  [<ffffffff811a1f0d>] ? trace_hardirqs_on+0xd/0x10
> [  293.110057]  [<ffffffff81170826>] smpboot_thread_fn+0x2b6/0x2c0
> [  293.110057]  [<ffffffff81170570>] ?
> smpboot_register_percpu_thread+0x100/0x100
> [  293.110057]  [<ffffffff81167755>] kthread+0x105/0x110
> [  293.110057]  [<ffffffff8437fca2>] ? wait_for_completion+0xc2/0x110
> [  293.110057]  [<ffffffff81167650>] ? set_kthreadd_affinity+0x30/0x30
> [  293.110057]  [<ffffffff8438e28c>] ret_from_fork+0x7c/0xb0
> [  293.110057]  [<ffffffff81167650>] ? set_kthreadd_affinity+0x30/0x30
> [  293.110057] Code: b5 05 01 66 0f 1f 44 00 00 45 89 ed 4d 0f a3 2e 45
> 19 ed 45 85 ed 41 0f 95 c7 4c 8d 73 24 31 c0 c7 45 cc 00 00 00 00 66 90
> f3 90 <44> 8b 6b 20 41 39 c5 74 66 41 83 fd 02 74 0f 41 83 fd 03 75 41
> 
> This is followed by a dump of the rest of the CPU states, and I see the
> following:
> 
> [  268.450444] NMI backtrace for cpu 34
> [  268.450444] CPU: 34 PID: 9859 Comm: trinity-c129 Tainted: G        W
> 3.14.0-rc3-next-20140218-sasha-00008-g16140ff-dirty #109
> [  268.450444] task: ffff880154aa3000 ti: ffff8800c40ba000 task.ti:
> ffff8800c40ba000
> [  268.450444] RIP: 0010:[<ffffffff810b801a>]  [<ffffffff810b801a>]
> pvclock_clocksource_read+0x1a/0xc0
> [  268.450444] RSP: 0018:ffff8800c40bb538  EFLAGS: 00000082
> [  268.450444] RAX: 00000000905f62ec RBX: ffff8805b93d8500 RCX:
> 000000000000000a
> [  268.450444] RDX: 00000000000000a1 RSI: 0000000000000040 RDI:
> ffff881027f12880
> [  268.450444] RBP: ffff8800c40bb548 R08: 0000000000000000 R09:
> 0000000000000001
> [  268.450444] R10: 0000000000000002 R11: 0000000000000001 R12:
> ffff8805b93d8510
> [  268.450444] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000001
> [  268.450444] FS:  00007f938e16a700(0000) GS:ffff8805b9200000(0000)
> knlGS:0000000000000000
> [  268.450444] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  268.450444] CR2: 00007f938a658a38 CR3: 00000000c40b2000 CR4:
> 00000000000006e0
> [  268.450444] DR0: 000000000089e000 DR1: 00007f938d51a000 DR2:
> 0000000000000000
> [  268.450444] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000f7060a
> [  268.450444] Stack:
> [  268.450444]  ffff8805b93d8500 ffff8805b93d8510 ffff8800c40bb558
> ffffffff810b6f04
> [  268.450444]  ffff8800c40bb568 ffffffff81076e1d ffff8800c40bb598
> ffffffff81180635
> [  268.450444]  ffffffff875f7be0 0000000000000022 00000000001d8500
> ffff8805b93d8500
> [  268.450444] Call Trace:
> [  268.450444]  [<ffffffff810b6f04>] kvm_clock_read+0x24/0x50
> [  268.450444]  [<ffffffff81076e1d>] sched_clock+0x1d/0x30
> [  268.450444]  [<ffffffff81180635>] sched_clock_local+0x25/0x90
> [  268.450444]  [<ffffffff811808a8>] sched_clock_cpu+0xb8/0x110
> [  268.450444]  [<ffffffff8118094d>] local_clock+0x2d/0x40
> [  268.450444]  [<ffffffff811a3358>] __lock_acquire+0x2a8/0x580
> [  268.450444]  [<ffffffff8118c84d>] ? update_blocked_averages+0x61d/0x670
> [  268.450444]  [<ffffffff811a37b2>] lock_acquire+0x182/0x1d0
> [  268.450444]  [<ffffffff8118ce2c>] ? idle_balance+0x3c/0x2c0
> [  268.450444]  [<ffffffff8118ce61>] idle_balance+0x71/0x2c0
> [  268.450444]  [<ffffffff8118ce2c>] ? idle_balance+0x3c/0x2c0
> [  268.450444]  [<ffffffff8118d2fe>] pick_next_task_fair+0x24e/0x290
> [  268.450444]  [<ffffffff81173a2e>] ? dequeue_task+0x6e/0x80
> [  268.450444]  [<ffffffff8437e3b5>] __schedule+0x2a5/0x840
> [  268.450444]  [<ffffffff8437ec15>] schedule+0x65/0x70
> [  268.450444]  [<ffffffff8437d8d8>] schedule_timeout+0x38/0x2a0
> [  268.450444]  [<ffffffff8119c8fe>] ? put_lock_stats+0xe/0x30
> [  268.450444]  [<ffffffff811a196c>] ? mark_held_locks+0x6c/0x90
> [  268.450444]  [<ffffffff84384e0b>] ? _raw_spin_unlock_irq+0x2b/0x80
> [  268.450444]  [<ffffffff8437fc97>] wait_for_completion+0xb7/0x110
> [  268.450444]  [<ffffffff8117e6b0>] ? try_to_wake_up+0x2a0/0x2a0
> [  268.450444]  [<ffffffff812034a1>] stop_two_cpus+0x261/0x2b0
> [  268.450444]  [<ffffffff81176c00>] ? __migrate_swap_task+0xa0/0xa0
> [  268.450444]  [<ffffffff810b6f04>] ? kvm_clock_read+0x24/0x50
> [  268.450444]  [<ffffffff812037d0>] ? stop_cpus+0x60/0x60
> [  268.450444]  [<ffffffff812037d0>] ? stop_cpus+0x60/0x60
> [  268.450444]  [<ffffffff8119c8fe>] ? put_lock_stats+0xe/0x30
> [  268.450444]  [<ffffffff8437fc1f>] ? wait_for_completion+0x3f/0x110
> [  268.450444]  [<ffffffff811740d5>] migrate_swap+0x175/0x1a0
> [  268.450444]  [<ffffffff8118b1e0>] task_numa_migrate+0x510/0x600
> [  268.450444]  [<ffffffff8118acd0>] ? select_task_rq_fair+0x440/0x440
> [  268.450444]  [<ffffffff8118b31d>] numa_migrate_preferred+0x4d/0x60
> [  268.450444]  [<ffffffff8118b900>] task_numa_fault+0x190/0x210
> [  268.450444]  [<ffffffff812c4f62>] do_huge_pmd_numa_page+0x432/0x450
> [  268.450444]  [<ffffffff8119c8fe>] ? put_lock_stats+0xe/0x30
> [  268.450444]  [<ffffffff81abccda>] ? delay_tsc+0xea/0x110
> [  268.450444]  [<ffffffff81288724>] __handle_mm_fault+0x244/0x3a0
> [  268.450444]  [<ffffffff812c776d>] ? rcu_read_unlock+0x5d/0x60
> [  268.450444]  [<ffffffff8128898b>] handle_mm_fault+0x10b/0x1b0
> [  268.450444]  [<ffffffff84389252>] ? __do_page_fault+0x2e2/0x590
> [  268.450444]  [<ffffffff843894c1>] __do_page_fault+0x551/0x590
> [  268.450444]  [<ffffffff81181561>] ? vtime_account_user+0x91/0xa0
> [  268.450444]  [<ffffffff8124f0c8>] ?
> context_tracking_user_exit+0xa8/0x1c0
> [  268.450444]  [<ffffffff84384f40>] ? _raw_spin_unlock+0x30/0x50
> [  268.450444]  [<ffffffff81181561>] ? vtime_account_user+0x91/0xa0
> [  268.450444]  [<ffffffff8124f0c8>] ?
> context_tracking_user_exit+0xa8/0x1c0
> [  268.450444]  [<ffffffff843895bd>] do_page_fault+0x3d/0x70
> [  268.450444]  [<ffffffff843888b5>] do_async_page_fault+0x35/0x100
> [  268.450444]  [<ffffffff84385798>] async_page_fault+0x28/0x30
> [  268.450444] Code: 6d e1 10 00 e8 98 8a 15 00 c9 c3 66 0f 1f 44 00 00
> 55 48 89 e5 48 83 ec 10 8b 37 eb 06 0f 1f 40 00 89 c6 0f 1f 00 0f ae e8
> 0f 31 <48> c1 e2 20 89 c0 0f be 4f 1c 48 09 c2 44 8b 47 18 48 2b 57 08
> 
> The stack trace is always pointing to migrate_swap() calling
> stop_two_cpus(). Note that I don't
> see another CPU in a stopped state when it occurs.
> 
> 
> Thanks,
> Sasha
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


  reply	other threads:[~2014-02-20  4:32 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-19 18:08 sched: hang in migrate_swap Sasha Levin
2014-02-20  4:32 ` Michael wang [this message]
2014-02-21 16:43   ` Sasha Levin
2014-02-22  1:45     ` Michael wang
2014-02-24  3:23       ` Sasha Levin
2014-02-24  5:19         ` Michael wang
2014-02-24  5:54           ` Sasha Levin
2014-02-24  7:10           ` Peter Zijlstra
2014-02-24 10:14             ` Michael wang
2014-02-24 12:12               ` Peter Zijlstra
2014-02-24 13:10                 ` Peter Zijlstra
2014-02-25  4:47                   ` Michael wang
2014-02-25 10:49                     ` Peter Zijlstra
2014-02-26  2:32                       ` Michael wang
2014-02-24 18:21                 ` Sasha Levin
2014-02-25  2:48                   ` Michael wang
2014-02-25 11:03                   ` Peter Zijlstra
2014-02-25  3:01                 ` Michael wang
2014-02-27 13:33                 ` [tip:sched/core] sched: Guarantee task priority in pick_next_task () tip-bot for Peter Zijlstra
2014-04-10  3:31                 ` sched: hang in migrate_swap Sasha Levin
2014-04-10  6:59                   ` Michael wang
2014-04-10 13:38                     ` Kirill Tkhai
2014-04-11 14:32                       ` Sasha Levin
2014-04-11 15:16                         ` Kirill Tkhai
2014-05-12 18:48                           ` Sasha Levin
2014-05-14  9:42                             ` Kirill Tkhai
2014-05-14 10:13                               ` Peter Zijlstra
2014-05-14 10:21                                 ` Kirill Tkhai
2014-05-14 10:26                                   ` Peter Zijlstra
2014-05-14 11:20                                     ` Peter Zijlstra
2015-06-15 19:38                                     ` Rafael David Tinoco
2015-06-15 19:47                                       ` Peter Zijlstra
2014-04-18  8:24                       ` [tip:sched/urgent] sched: Check for stop task appearance when balancing happens tip-bot for Kirill Tkhai
2014-04-10  7:42                   ` sched: hang in migrate_swap Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5305856F.3000109@linux.vnet.ibm.com \
    --to=wangyun@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=sasha.levin@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).