linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sachin Sant <sachinp@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <zwisler@gmail.com>, Mike Galbraith <efault@gmx.de>,
	Matt Fleming <matt@codeblueprint.co.uk>,
	Michael Ellerman <mpe@ellerman.id.au>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"linux-next@vger.kernel.org" <linux-next@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
Date: Fri, 3 Feb 2017 10:03:14 +0530	[thread overview]
Message-ID: <3BAB5673-2BF8-46E0-8018-6A0F3DEEC9F7@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170202155506.GX6515@twins.programming.kicks-ass.net>


> On 02-Feb-2017, at 9:25 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Tue, Jan 31, 2017 at 10:22:47AM -0700, Ross Zwisler wrote:
>> On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de> wrote:
>>> On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
> 
> 
> Could some of you test this? It seems to cure things in my (very)
> limited testing.
> 

I ran few cycles of cpu hot(un)plug tests. In most cases it works except one
where I ran into rcu stall:

[  173.493453] INFO: rcu_sched detected stalls on CPUs/tasks:
[  173.493473] 	8-...: (2 GPs behind) idle=006/140000000000000/0 softirq=0/0 fqs=2996 
[  173.493476] 	(detected by 0, t=6002 jiffies, g=885, c=884, q=6350)
[  173.493482] Task dump for CPU 8:
[  173.493484] cpuhp/8         R  running task        0  3416      2 0x00000884
[  173.493489] Call Trace:
[  173.493492] [c0000004f7b834a0] [c0000004f7b83560] 0xc0000004f7b83560 (unreliable)
[  173.493498] [c0000004f7b83670] [c000000000008d28] alignment_common+0x128/0x130
[  173.493503] --- interrupt: 600 at _raw_spin_lock+0x2c/0xc0
[  173.493503]     LR = try_to_wake_up+0x204/0x5c0
[  173.493507] [c0000004f7b83960] [c0000004f4d8084c] 0xc0000004f4d8084c (unreliable)
[  173.493511] [c0000004f7b83990] [c0000000000fef54] try_to_wake_up+0x204/0x5c0
[  173.493515] [c0000004f7b83a10] [c0000000000e2b88] create_worker+0x148/0x250
[  173.493519] [c0000004f7b83ab0] [c0000000000e6e1c] alloc_unbound_pwq+0x3bc/0x4c0
[  173.493522] [c0000004f7b83b10] [c0000000000e7084] wq_update_unbound_numa+0x164/0x270
[  173.493526] [c0000004f7b83bb0] [c0000000000e8990] workqueue_online_cpu+0x250/0x3b0
[  173.493529] [c0000004f7b83c70] [c0000000000c2758] cpuhp_invoke_callback+0x148/0x5b0
[  173.493533] [c0000004f7b83ce0] [c0000000000c2df8] cpuhp_up_callbacks+0x48/0x140
[  173.493536] [c0000004f7b83d30] [c0000000000c3e98] cpuhp_thread_fun+0x148/0x180
[  173.493540] [c0000004f7b83d60] [c0000000000f3930] smpboot_thread_fn+0x290/0x2a0
[  173.493544] [c0000004f7b83dc0] [c0000000000edb3c] kthread+0x14c/0x190
[  173.493547] [c0000004f7b83e30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[  243.913715] INFO: task kworker/0:2:380 blocked for more than 120 seconds.
[  243.913732]       Not tainted 4.10.0-rc6-next-20170202 #6
[  243.913735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.913738] kworker/0:2     D    0   380      2 0x00000800
[  243.913746] Workqueue: events vmstat_shepherd
[  243.913748] Call Trace:
[  243.913752] [c0000000ff07f820] [c00000000011135c] enqueue_entity+0x81c/0x1200 (unreliable)
[  243.913757] [c0000000ff07f9f0] [c00000000001a660] __switch_to+0x300/0x400
[  243.913762] [c0000000ff07fa50] [c0000000008df4f4] __schedule+0x314/0xb10
[  243.913766] [c0000000ff07fb20] [c0000000008dfd30] schedule+0x40/0xb0
[  243.913769] [c0000000ff07fb50] [c0000000008e02b8] schedule_preempt_disabled+0x18/0x30
[  243.913773] [c0000000ff07fb70] [c0000000008e1654] __mutex_lock.isra.6+0x1a4/0x660
[  243.913777] [c0000000ff07fc00] [c0000000000c3828] get_online_cpus+0x48/0x90
[  243.913780] [c0000000ff07fc30] [c00000000025fd78] vmstat_shepherd+0x38/0x150
[  243.913784] [c0000000ff07fc80] [c0000000000e5794] process_one_work+0x1a4/0x4d0
[  243.913788] [c0000000ff07fd20] [c0000000000e5b58] worker_thread+0x98/0x5a0
[  243.913791] [c0000000ff07fdc0] [c0000000000edb3c] kthread+0x14c/0x190
[  243.913795] [c0000000ff07fe30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[  243.913824] INFO: task drmgr:3413 blocked for more than 120 seconds.
[  243.913826]       Not tainted 4.10.0-rc6-next-20170202 #6
[  243.913829] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.913831] drmgr           D    0  3413   3114 0x00040080
[  243.913834] Call Trace:
[  243.913836] [c000000257ff3380] [c000000257ff3440] 0xc000000257ff3440 (unreliable)
[  243.913840] [c000000257ff3550] [c00000000001a660] __switch_to+0x300/0x400
[  243.913844] [c000000257ff35b0] [c0000000008df4f4] __schedule+0x314/0xb10
[  243.913847] [c000000257ff3680] [c0000000008dfd30] schedule+0x40/0xb0
[  243.913851] [c000000257ff36b0] [c0000000008e4594] schedule_timeout+0x274/0x470
[  243.913855] [c000000257ff37b0] [c0000000008e0efc] wait_for_common+0x1ac/0x2c0
[  243.913858] [c000000257ff3830] [c0000000000c50e4] bringup_cpu+0x84/0xe0
[  243.913862] [c000000257ff3860] [c0000000000c2758] cpuhp_invoke_callback+0x148/0x5b0
[  243.913865] [c000000257ff38d0] [c0000000000c2df8] cpuhp_up_callbacks+0x48/0x140
[  243.913868] [c000000257ff3920] [c0000000000c5438] _cpu_up+0xe8/0x1c0
[  243.913872] [c000000257ff3980] [c0000000000c5630] do_cpu_up+0x120/0x150
[  243.913876] [c000000257ff3a00] [c0000000005c005c] cpu_subsys_online+0x5c/0xe0
[  243.913879] [c000000257ff3a50] [c0000000005b7d84] device_online+0xb4/0x120
[  243.913883] [c000000257ff3a90] [c000000000093424] dlpar_online_cpu+0x144/0x1e0
[  243.913887] [c000000257ff3b50] [c000000000093c08] dlpar_cpu_add+0x108/0x2f0
[  243.913891] [c000000257ff3be0] [c0000000000948dc] dlpar_cpu_probe+0x3c/0x80
[  243.913894] [c000000257ff3c20] [c0000000000207a8] arch_cpu_probe+0x38/0x60
[  243.913898] [c000000257ff3c40] [c0000000005c0880] cpu_probe_store+0x40/0x70
[  243.913902] [c000000257ff3c70] [c0000000005b2e94] dev_attr_store+0x34/0x60
[  243.913906] [c000000257ff3c90] [c0000000003b0fc4] sysfs_kf_write+0x64/0xa0
[  243.913910] [c000000257ff3cb0] [c0000000003afd10] kernfs_fop_write+0x170/0x250
[  243.913914] [c000000257ff3d00] [c0000000002fb0f0] __vfs_write+0x40/0x1c0
[  243.913917] [c000000257ff3d90] [c0000000002fcba8] vfs_write+0xc8/0x240
[  243.913921] [c000000257ff3de0] [c0000000002fe790] SyS_write+0x60/0x110
[  243.913924] [c000000257ff3e30] [c00000000000b184] system_call+0x38/0xe0
[  243.913929] INFO: task ppc64_cpu:3423 blocked for more than 120 seconds.
[  243.913931]       Not tainted 4.10.0-rc6-next-20170202 #6
[  243.913933] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Thanks
-Sachin

  parent reply	other threads:[~2017-02-03  4:33 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-21 13:38 [PATCH v2 0/7] sched: Diagnostic checks for missing rq clock updates Matt Fleming
2016-09-21 13:38 ` [PATCH v2 1/7] sched/fair: Update the rq clock before detaching tasks Matt Fleming
2016-10-03 12:49   ` Peter Zijlstra
2016-10-03 14:37     ` Matt Fleming
2016-10-03 14:42       ` Peter Zijlstra
2016-09-21 13:38 ` [PATCH v2 2/7] sched/fair: Update rq clock before waking up new task Matt Fleming
2016-09-21 13:38 ` [PATCH v2 3/7] sched/fair: Update rq clock in task_hot() Matt Fleming
2016-09-21 13:38 ` [PATCH v2 4/7] sched: Add wrappers for lockdep_(un)pin_lock() Matt Fleming
2017-01-14 12:40   ` [tip:sched/core] sched/core: " tip-bot for Matt Fleming
2016-09-21 13:38 ` [PATCH v2 5/7] sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock Matt Fleming
2017-01-14 12:41   ` [tip:sched/core] " tip-bot for Matt Fleming
2016-09-21 13:38 ` [PATCH v2 6/7] sched/fair: Push rq lock pin/unpin into idle_balance() Matt Fleming
2017-01-14 12:41   ` [tip:sched/core] " tip-bot for Matt Fleming
2016-09-21 13:38 ` [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock() Matt Fleming
2016-09-21 15:58   ` Petr Mladek
2016-09-21 19:08     ` Matt Fleming
2016-09-21 19:46       ` Thomas Gleixner
2016-09-22  0:44       ` Sergey Senozhatsky
2016-09-22  8:04     ` Peter Zijlstra
2016-09-22  8:36       ` Jan Kara
2016-09-22  9:39         ` Peter Zijlstra
2016-09-22 10:17           ` Peter Zijlstra
2017-01-14 12:44   ` [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls tip-bot for Matt Fleming
     [not found]     ` <87tw8gutp6.fsf@concordia.ellerman.id.au>
2017-01-30 21:34       ` Matt Fleming
2017-01-31  8:35         ` Michael Ellerman
2017-01-31 11:00         ` Sachin Sant
2017-01-31 11:48           ` Mike Galbraith
2017-01-31 17:22             ` Ross Zwisler
2017-02-02 15:55               ` Peter Zijlstra
2017-02-02 22:01                 ` Matt Fleming
2017-02-03  3:05                 ` Mike Galbraith
2017-02-03  4:33                 ` Sachin Sant [this message]
2017-02-03  8:53                   ` Peter Zijlstra
2017-02-03 12:59                     ` Mike Galbraith
2017-02-03 13:37                       ` Peter Zijlstra
2017-02-03 13:52                         ` Mike Galbraith
2017-02-03 15:44                         ` Paul E. McKenney
2017-02-03 15:54                           ` Paul E. McKenney
2017-02-06  6:23                             ` Sachin Sant
2017-02-06 15:10                               ` Paul E. McKenney
2017-02-06 15:14                                 ` Paul E. McKenney
2017-02-03 13:04                 ` Borislav Petkov
2017-02-22  9:03                 ` Wanpeng Li
2017-02-24  9:16                 ` [tip:sched/urgent] sched/core: Fix update_rq_clock() splat on hotplug (and suspend/resume) tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3BAB5673-2BF8-46E0-8018-6A0F3DEEC9F7@linux.vnet.ibm.com \
    --to=sachinp@linux.vnet.ibm.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mpe@ellerman.id.au \
    --cc=peterz@infradead.org \
    --cc=zwisler@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).