From: "Niklas Söderlund" <niklas.soderlund@ragnatech.se> To: Vincent Guittot <vincent.guittot@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org>, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>, Ingo Molnar <mingo@redhat.com>, linux-kernel@vger.kernel.org, linux-renesas-soc@vger.kernel.org Subject: Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle") Date: Thu, 12 Apr 2018 11:18:22 +0200 [thread overview] Message-ID: <20180412091822.GG12256@bigcity.dyn.berto.se> (raw) Hi Vincent, I have observed issues running on linus/master from a few days back [1]. I'm running on a Renesas Koelsch board (arm32) and I can trigger a issue by X forwarding the v4l2 test application qv4l2 over ssh and moving the courser around in the GUI (best test case description award...). I'm sorry about the really bad way I trigger this but I can't do it in any other way, I'm happy to try other methods if you got some ideas. The symptom of the issue is a complete hang of the system for more then 30 seconds and then this information is printed in the console: [ 142.849390] INFO: rcu_sched detected stalls on CPUs/tasks: [ 142.854972] 1-...!: (1 GPs behind) idle=7a4/0/0 softirq=3214/3217 fqs=0 [ 142.861976] (detected by 0, t=8232 jiffies, g=930, c=929, q=11) [ 142.868042] Sending NMI from CPU 0 to CPUs 1: [ 142.872443] NMI backtrace for cpu 1 [ 142.872452] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.16.0-05506-g28aba11c1393691a #14 [ 142.872455] Hardware name: Generic R8A7791 (Flattened Device Tree) [ 142.872473] PC is at arch_cpu_idle+0x28/0x44 [ 142.872484] LR is at trace_hardirqs_on_caller+0x1a4/0x1d4 [ 142.872488] pc : [<c01088a4>] lr : [<c01747a8>] psr: 20070013 [ 142.872491] sp : eb0b9f90 ip : eb0b9f60 fp : eb0b9f9c [ 142.872495] r10: 00000000 r9 : 413fc0f2 r8 : 4000406a [ 142.872498] r7 : c0c08478 r6 : c0c0842c r5 : ffffe000 r4 : 00000002 [ 142.872502] r3 : eb0b6ec0 r2 : 00000000 r1 : 00000004 r0 : 00000001 [ 142.872507] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 142.872511] Control: 10c5387d Table: 6a61406a DAC: 00000051 [ 142.872516] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.16.0-05506-g28aba11c1393691a #14 [ 142.872519] Hardware name: Generic R8A7791 (Flattened Device Tree) [ 142.872522] Backtrace: [ 142.872534] [<c010cbf8>] (dump_backtrace) from [<c010ce78>] (show_stack+0x18/0x1c) [ 142.872540] r7:c0c81388 r6:00000000 r5:60070193 r4:c0c81388 [ 142.872550] [<c010ce60>] (show_stack) from [<c07dec20>] (dump_stack+0xa4/0xd8) [ 142.872557] [<c07deb7c>] (dump_stack) from [<c0108b38>] (show_regs+0x14/0x18) [ 142.872563] r9:00000001 r8:00000000 r7:c0c4f678 r6:eb0b9f40 r5:00000001 r4:c13e1130 [ 142.872571] [<c0108b24>] (show_regs) from [<c07e4cd0>] (nmi_cpu_backtrace+0xfc/0x118) [ 142.872578] [<c07e4bd4>] (nmi_cpu_backtrace) from [<c010fb28>] (handle_IPI+0x2a8/0x320) [ 142.872583] r7:c0c4f678 r6:eb0b9f40 r5:00000007 r4:c0b75b68 [ 142.872594] [<c010f880>] (handle_IPI) from [<c03cb528>] (gic_handle_irq+0x8c/0x98) [ 142.872599] r10:00000000 r9:eb0b8000 r8:f0803000 r7:c0c4f678 r6:eb0b9f40 r5:c0c08a90 [ 142.872602] r4:f0802000 [ 142.872608] [<c03cb49c>] (gic_handle_irq) from [<c01019f0>] (__irq_svc+0x70/0x98) [ 142.872612] Exception stack(0xeb0b9f40 to 0xeb0b9f88) [ 142.872618] 9f40: 00000001 00000004 00000000 eb0b6ec0 00000002 ffffe000 c0c0842c c0c08478 [ 142.872624] 9f60: 4000406a 413fc0f2 00000000 eb0b9f9c eb0b9f60 eb0b9f90 c01747a8 c01088a4 [ 142.872627] 9f80: 20070013 ffffffff [ 142.872632] r9:eb0b8000 r8:4000406a r7:eb0b9f74 r6:ffffffff r5:20070013 r4:c01088a4 [ 142.872642] [<c010887c>] (arch_cpu_idle) from [<c07fb1a8>] (default_idle_call+0x34/0x38) [ 142.872650] [<c07fb174>] (default_idle_call) from [<c0155fe0>] (do_idle+0xe0/0x134) [ 142.872656] [<c0155f00>] (do_idle) from [<c0156398>] (cpu_startup_entry+0x20/0x24) [ 142.872660] r7:c0c8e9d0 r6:10c0387d r5:00000051 r4:00000085 [ 142.872667] [<c0156378>] (cpu_startup_entry) from [<c010f644>] (secondary_start_kernel+0x114/0x134) [ 142.872673] [<c010f530>] (secondary_start_kernel) from [<401026ec>] (0x401026ec) [ 142.872676] r5:00000051 r4:6b0a406a [ 142.873456] rcu_sched kthread starved for 8235 jiffies! g930 c929 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 [ 143.135040] RCU grace-period kthread stack dump: [ 143.139695] rcu_sched I 0 9 2 0x00000000 [ 143.145234] Backtrace: [ 143.147719] [<c07f5120>] (__schedule) from [<c07f5b3c>] (schedule+0x94/0xb8) [ 143.154823] r10:c0b714c0 r9:c0c85f8a r8:ffffffff r7:eb0abec4 r6:ffffa274 r5:00000000 [ 143.162712] r4:ffffe000 [ 143.165273] [<c07f5aa8>] (schedule) from [<c07fa940>] (schedule_timeout+0x440/0x4b0) [ 143.173076] r5:00000000 r4:eb79c4c0 [ 143.176692] [<c07fa500>] (schedule_timeout) from [<c0196e08>] (rcu_gp_kthread+0x958/0x150c) [ 143.185108] r10:c0c87274 r9:00000000 r8:c0c165b8 r7:00000001 r6:00000000 r5:c0c16590 [ 143.192997] r4:c0c16300 [ 143.195560] [<c01964b0>] (rcu_gp_kthread) from [<c0145eb8>] (kthread+0x148/0x160) [ 143.203099] r7:c0c16300 [ 143.205660] [<c0145d70>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20) [ 143.212938] Exception stack(0xeb0abfb0 to 0xeb0abff8) [ 143.218030] bfa0: 00000000 00000000 00000000 00000000 [ 143.226271] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 143.234511] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 143.241177] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0145d70 [ 143.249065] r4:eb037b00 After the freeze the system becomes responsive again and I can sometimes trigger the hang multiple times. I tried to bisect the problem and I found that by reverting [2] I can no longer reproduce the issue. I can also not reproduce the issue on v4.16. I can't figure out if reverting [2] is just treating a symptom or the root cause of my troubles and would appreciate your input. Also my "test-case" do not trigger every time but I have tested this scenario quiet a lot and the result seems to be constant. My test setup involves a NFS root filesystem, I ssh in and launch the GUI application over X forwarding. From what I know the application is not doing any ioctl calls to the v4l2 framework it's just sitting there idle as I move the courser around showing tool tips and such as I hover over elements and then it freeze up. I have not observed this issue by just booting the system and leaving it idle, movement in the GUI seems to be the key to trigger this. I'm a bit lost on how to progress with this issue and would appreciate any help you can provide to help me figure this out. 1. c18bb396d3d261eb ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")) 2. 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle") -- Regards, Niklas Söderlund
WARNING: multiple messages have this Message-ID (diff)
From: "Niklas Söderlund" <niklas.soderlund@ragnatech.se> To: Vincent Guittot <vincent.guittot@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org>, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>, Ingo Molnar <mingo@redhat.com>, linux-kernel@vger.kernel.org, linux-renesas-soc@vger.kernel.org Subject: Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle") Date: Thu, 12 Apr 2018 11:18:22 +0200 [thread overview] Message-ID: <20180412091822.GG12256@bigcity.dyn.berto.se> (raw) Hi Vincent, I have observed issues running on linus/master from a few days back [1]. I'm running on a Renesas Koelsch board (arm32) and I can trigger a issue by X forwarding the v4l2 test application qv4l2 over ssh and moving the courser around in the GUI (best test case description award...). I'm sorry about the really bad way I trigger this but I can't do it in any other way, I'm happy to try other methods if you got some ideas. The symptom of the issue is a complete hang of the system for more then 30 seconds and then this information is printed in the console: [ 142.849390] INFO: rcu_sched detected stalls on CPUs/tasks: [ 142.854972] 1-...!: (1 GPs behind) idle=7a4/0/0 softirq=3214/3217 fqs=0 [ 142.861976] (detected by 0, t=8232 jiffies, g=930, c=929, q=11) [ 142.868042] Sending NMI from CPU 0 to CPUs 1: [ 142.872443] NMI backtrace for cpu 1 [ 142.872452] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.16.0-05506-g28aba11c1393691a #14 [ 142.872455] Hardware name: Generic R8A7791 (Flattened Device Tree) [ 142.872473] PC is at arch_cpu_idle+0x28/0x44 [ 142.872484] LR is at trace_hardirqs_on_caller+0x1a4/0x1d4 [ 142.872488] pc : [<c01088a4>] lr : [<c01747a8>] psr: 20070013 [ 142.872491] sp : eb0b9f90 ip : eb0b9f60 fp : eb0b9f9c [ 142.872495] r10: 00000000 r9 : 413fc0f2 r8 : 4000406a [ 142.872498] r7 : c0c08478 r6 : c0c0842c r5 : ffffe000 r4 : 00000002 [ 142.872502] r3 : eb0b6ec0 r2 : 00000000 r1 : 00000004 r0 : 00000001 [ 142.872507] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 142.872511] Control: 10c5387d Table: 6a61406a DAC: 00000051 [ 142.872516] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.16.0-05506-g28aba11c1393691a #14 [ 142.872519] Hardware name: Generic R8A7791 (Flattened Device Tree) [ 142.872522] Backtrace: [ 142.872534] [<c010cbf8>] (dump_backtrace) from [<c010ce78>] (show_stack+0x18/0x1c) [ 142.872540] r7:c0c81388 r6:00000000 r5:60070193 r4:c0c81388 [ 142.872550] [<c010ce60>] (show_stack) from [<c07dec20>] (dump_stack+0xa4/0xd8) [ 142.872557] [<c07deb7c>] (dump_stack) from [<c0108b38>] (show_regs+0x14/0x18) [ 142.872563] r9:00000001 r8:00000000 r7:c0c4f678 r6:eb0b9f40 r5:00000001 r4:c13e1130 [ 142.872571] [<c0108b24>] (show_regs) from [<c07e4cd0>] (nmi_cpu_backtrace+0xfc/0x118) [ 142.872578] [<c07e4bd4>] (nmi_cpu_backtrace) from [<c010fb28>] (handle_IPI+0x2a8/0x320) [ 142.872583] r7:c0c4f678 r6:eb0b9f40 r5:00000007 r4:c0b75b68 [ 142.872594] [<c010f880>] (handle_IPI) from [<c03cb528>] (gic_handle_irq+0x8c/0x98) [ 142.872599] r10:00000000 r9:eb0b8000 r8:f0803000 r7:c0c4f678 r6:eb0b9f40 r5:c0c08a90 [ 142.872602] r4:f0802000 [ 142.872608] [<c03cb49c>] (gic_handle_irq) from [<c01019f0>] (__irq_svc+0x70/0x98) [ 142.872612] Exception stack(0xeb0b9f40 to 0xeb0b9f88) [ 142.872618] 9f40: 00000001 00000004 00000000 eb0b6ec0 00000002 ffffe000 c0c0842c c0c08478 [ 142.872624] 9f60: 4000406a 413fc0f2 00000000 eb0b9f9c eb0b9f60 eb0b9f90 c01747a8 c01088a4 [ 142.872627] 9f80: 20070013 ffffffff [ 142.872632] r9:eb0b8000 r8:4000406a r7:eb0b9f74 r6:ffffffff r5:20070013 r4:c01088a4 [ 142.872642] [<c010887c>] (arch_cpu_idle) from [<c07fb1a8>] (default_idle_call+0x34/0x38) [ 142.872650] [<c07fb174>] (default_idle_call) from [<c0155fe0>] (do_idle+0xe0/0x134) [ 142.872656] [<c0155f00>] (do_idle) from [<c0156398>] (cpu_startup_entry+0x20/0x24) [ 142.872660] r7:c0c8e9d0 r6:10c0387d r5:00000051 r4:00000085 [ 142.872667] [<c0156378>] (cpu_startup_entry) from [<c010f644>] (secondary_start_kernel+0x114/0x134) [ 142.872673] [<c010f530>] (secondary_start_kernel) from [<401026ec>] (0x401026ec) [ 142.872676] r5:00000051 r4:6b0a406a [ 142.873456] rcu_sched kthread starved for 8235 jiffies! g930 c929 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 [ 143.135040] RCU grace-period kthread stack dump: [ 143.139695] rcu_sched I 0 9 2 0x00000000 [ 143.145234] Backtrace: [ 143.147719] [<c07f5120>] (__schedule) from [<c07f5b3c>] (schedule+0x94/0xb8) [ 143.154823] r10:c0b714c0 r9:c0c85f8a r8:ffffffff r7:eb0abec4 r6:ffffa274 r5:00000000 [ 143.162712] r4:ffffe000 [ 143.165273] [<c07f5aa8>] (schedule) from [<c07fa940>] (schedule_timeout+0x440/0x4b0) [ 143.173076] r5:00000000 r4:eb79c4c0 [ 143.176692] [<c07fa500>] (schedule_timeout) from [<c0196e08>] (rcu_gp_kthread+0x958/0x150c) [ 143.185108] r10:c0c87274 r9:00000000 r8:c0c165b8 r7:00000001 r6:00000000 r5:c0c16590 [ 143.192997] r4:c0c16300 [ 143.195560] [<c01964b0>] (rcu_gp_kthread) from [<c0145eb8>] (kthread+0x148/0x160) [ 143.203099] r7:c0c16300 [ 143.205660] [<c0145d70>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20) [ 143.212938] Exception stack(0xeb0abfb0 to 0xeb0abff8) [ 143.218030] bfa0: 00000000 00000000 00000000 00000000 [ 143.226271] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 143.234511] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 143.241177] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0145d70 [ 143.249065] r4:eb037b00 After the freeze the system becomes responsive again and I can sometimes trigger the hang multiple times. I tried to bisect the problem and I found that by reverting [2] I can no longer reproduce the issue. I can also not reproduce the issue on v4.16. I can't figure out if reverting [2] is just treating a symptom or the root cause of my troubles and would appreciate your input. Also my "test-case" do not trigger every time but I have tested this scenario quiet a lot and the result seems to be constant. My test setup involves a NFS root filesystem, I ssh in and launch the GUI application over X forwarding. From what I know the application is not doing any ioctl calls to the v4l2 framework it's just sitting there idle as I move the courser around showing tool tips and such as I hover over elements and then it freeze up. I have not observed this issue by just booting the system and leaving it idle, movement in the GUI seems to be the key to trigger this. I'm a bit lost on how to progress with this issue and would appreciate any help you can provide to help me figure this out. 1. c18bb396d3d261eb ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")) 2. 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle") -- Regards, Niklas S�derlund
next reply other threads:[~2018-04-12 9:18 UTC|newest] Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-04-12 9:18 Niklas Söderlund [this message] 2018-04-12 9:18 ` Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle") Niklas Söderlund 2018-04-12 10:33 ` Vincent Guittot 2018-04-12 11:15 ` Niklas Söderlund 2018-04-12 11:15 ` Niklas Söderlund [not found] ` <20180412133031.GA551@linaro.org> 2018-04-12 19:43 ` Heiner Kallweit 2018-04-14 11:21 ` Vincent Guittot 2018-04-12 22:39 ` Niklas Söderlund 2018-04-12 22:39 ` Niklas Söderlund 2018-04-14 11:24 ` Vincent Guittot 2018-04-20 16:00 ` Vincent Guittot 2018-04-20 16:00 ` Vincent Guittot 2018-04-20 16:30 ` Joel Fernandes 2018-04-22 22:18 ` Niklas Söderlund 2018-04-22 22:18 ` Niklas Söderlund 2018-04-23 9:54 ` Vincent Guittot 2018-04-23 9:54 ` Vincent Guittot [not found] ` <20180425225603.GA26177@bigcity.dyn.berto.se> 2018-04-26 10:31 ` Vincent Guittot 2018-04-26 10:31 ` Vincent Guittot 2018-04-26 11:48 ` Peter Zijlstra 2018-04-26 11:48 ` Peter Zijlstra 2018-04-26 14:41 ` Niklas Söderlund 2018-04-26 14:41 ` Niklas Söderlund 2018-04-26 15:27 ` Vincent Guittot 2018-04-26 15:38 ` Niklas Söderlund 2018-04-26 15:38 ` Niklas Söderlund 2018-05-02 13:40 ` Geert Uytterhoeven 2018-05-03 9:25 ` [tip:sched/urgent] sched/fair: Fix the update of blocked load when newly idle tip-bot for Vincent Guittot 2018-04-13 20:38 ` Potential problem with 31e77c93e432dec7 ("sched/fair: Update blocked load when newly idle") Niklas Söderlund 2018-04-13 20:38 ` Niklas Söderlund 2018-04-14 11:26 ` Vincent Guittot 2018-04-12 22:06 ` Niklas Söderlund 2018-04-12 22:06 ` Niklas Söderlund
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180412091822.GG12256@bigcity.dyn.berto.se \ --to=niklas.soderlund@ragnatech.se \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-renesas-soc@vger.kernel.org \ --cc=mingo@redhat.com \ --cc=paulmck@linux.vnet.ibm.com \ --cc=peterz@infradead.org \ --cc=vincent.guittot@linaro.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.