linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fwd: [CRON] Broken: ClangBuiltLinux/continuous-integration#895 (master - 2a3984b)
       [not found] ` <5d4d7164795c7_43f9afa8b58b0242711@29afa0b1-fa00-407e-a40e-a8edb471126a.mail>
@ 2019-08-09 21:21   ` Nick Desaulniers
  2019-08-10  3:58     ` Nathan Chancellor
  2019-08-12 12:54     ` Will Deacon
  0 siblings, 2 replies; 5+ messages in thread
From: Nick Desaulniers @ 2019-08-09 21:21 UTC (permalink / raw)
  To: linux-next, Stephen Rothwell, Will Deacon, Catalin Marinas, pauld
  Cc: Linux ARM, Mark Brown, Mark Rutland, Arnd Bergmann, Peter Zijlstra

Did anyone report any issue with last night's -next for arm64?

Some kind of deadlock in online_fair_sched_group.

[   15.256790] ================================
[   15.257025] WARNING: inconsistent lock state
[   15.257243] 5.3.0-rc3-next-20190809 #1 Not tainted
[   15.257393] --------------------------------
[   15.257526] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[   15.258096] init/1 [HC0[0]:SC0[0]:HE1:SE1] takes:
[   15.258522] (____ptrval____) (&rq->lock){?.-.}, at:
online_fair_sched_group+0x78/0xe4
[   15.259170] {IN-HARDIRQ-W} state was registered at:
[   15.259658]   lock_acquire+0x1dc/0x228
[   15.259940]   _raw_spin_lock+0x40/0x54
[   15.260251]   scheduler_tick+0x50/0xfc
[   15.260491]   update_process_times+0x80/0x98
[   15.260677]   tick_periodic+0xd8/0xf0
[   15.260910]   tick_handle_periodic+0x30/0x94
[   15.261126]   arch_timer_handler_virt+0x34/0x40
[   15.261332]   handle_percpu_devid_irq+0x1a8/0x3c4
[   15.261495]   __handle_domain_irq+0x7c/0xbc
[   15.261689]   gic_handle_irq+0x48/0xac
[   15.261881]   el1_irq+0xbc/0x180
[   15.262024]   _raw_spin_unlock_irqrestore+0x4c/0x80
[   15.262263]   tty_register_ldisc+0x58/0x6c
[   15.262430]   n_tty_init+0x18/0x20
[   15.262615]   console_init+0x20/0x3e4
[   15.262820]   start_kernel+0x248/0x3c4
[   15.263079] irq event stamp: 220201
[   15.263362] hardirqs last  enabled at (220201):
[<ffff000010e1f334>] _raw_spin_unlock_irqrestore+0x48/0x80
[   15.263731] hardirqs last disabled at (220200):
[<ffff000010e1f190>] _raw_spin_lock_irqsave+0x30/0x7c
[   15.264046] softirqs last  enabled at (220196):
[<ffff0000100f84c0>] irq_exit+0x114/0x134
[   15.264419] softirqs last disabled at (220185):
[<ffff0000100f84c0>] irq_exit+0x114/0x134
[   15.264751]
[   15.264751] other info that might help us debug this:
[   15.265044]  Possible unsafe locking scenario:
[   15.265044]
[   15.265256]        CPU0
[   15.265458]        ----
[   15.265615]   lock(&rq->lock);
[   15.265898]   <Interrupt>
[   15.266087]     lock(&rq->lock);
[   15.266353]
[   15.266353]  *** DEADLOCK ***
[   15.266353]
[   15.266574] no locks held by init/1.
[   15.266784]
[   15.266784] stack backtrace:
[   15.267120] CPU: 0 PID: 1 Comm: init Not tainted 5.3.0-rc3-next-20190809 #1
[   15.267341] Hardware name: linux,dummy-virt (DT)
[   15.267756] Call trace:
[   15.267928]  dump_backtrace+0x0/0x140
[   15.268159]  show_stack+0x14/0x1c
[   15.268341]  dump_stack+0xa8/0x104
[   15.268482]  mark_lock+0xda0/0xda8
[   15.268728]  __lock_acquire+0x300/0x858
[   15.268869]  lock_acquire+0x1dc/0x228
[   15.269057]  _raw_spin_lock+0x40/0x54
[   15.269201]  online_fair_sched_group+0x78/0xe4
[   15.269392]  sched_online_group+0x88/0xac
[   15.269591]  sched_autogroup_create_attach+0xcc/0x12c
[   15.269765]  ksys_setsid+0xe8/0xec
[   15.269990]  __arm64_sys_setsid+0xc/0x18
[   15.270178]  el0_svc_common+0x9c/0x15c
[   15.270317]  el0_svc_handler+0x5c/0x64
[   15.270493]  el0_svc+0x8/0xc

https://travis-ci.com/ClangBuiltLinux/continuous-integration/jobs/223856448

Guessing related to
commit 6b8fd01b21f5 ("sched/fair: Use rq_lock/unlock in
online_fair_sched_group")

---------- Forwarded message ---------
From: Travis CI <builds@travis-ci.com>
Date: Fri, Aug 9, 2019 at 6:13 AM
Subject: [CRON] Broken: ClangBuiltLinux/continuous-integration#895
(master - 2a3984b)
To: <ndesaulniers@google.com>, <natechancellor@gmail.com>


ClangBuiltLinux / continuous-integration

master

Build #895 was broken
3 hrs, 29 mins, and 39 secs
Nathan Chancellor
2a3984b CHANGESET →

Merge pull request #196 from nathanchance/ppc64

PPC64 big endian
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fwd: [CRON] Broken: ClangBuiltLinux/continuous-integration#895 (master - 2a3984b)
  2019-08-09 21:21   ` Fwd: [CRON] Broken: ClangBuiltLinux/continuous-integration#895 (master - 2a3984b) Nick Desaulniers
@ 2019-08-10  3:58     ` Nathan Chancellor
  2019-08-12 12:54     ` Will Deacon
  1 sibling, 0 replies; 5+ messages in thread
From: Nathan Chancellor @ 2019-08-10  3:58 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: linux-next, Stephen Rothwell, Will Deacon, Catalin Marinas,
	pauld, Linux ARM, Mark Brown, Mark Rutland, Arnd Bergmann,
	Peter Zijlstra

On Fri, Aug 09, 2019 at 02:21:21PM -0700, Nick Desaulniers wrote:
> Did anyone report any issue with last night's -next for arm64?
> 
> Some kind of deadlock in online_fair_sched_group.
> 
> [   15.256790] ================================
> [   15.257025] WARNING: inconsistent lock state
> [   15.257243] 5.3.0-rc3-next-20190809 #1 Not tainted
> [   15.257393] --------------------------------
> [   15.257526] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> [   15.258096] init/1 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [   15.258522] (____ptrval____) (&rq->lock){?.-.}, at:
> online_fair_sched_group+0x78/0xe4
> [   15.259170] {IN-HARDIRQ-W} state was registered at:
> [   15.259658]   lock_acquire+0x1dc/0x228
> [   15.259940]   _raw_spin_lock+0x40/0x54
> [   15.260251]   scheduler_tick+0x50/0xfc
> [   15.260491]   update_process_times+0x80/0x98
> [   15.260677]   tick_periodic+0xd8/0xf0
> [   15.260910]   tick_handle_periodic+0x30/0x94
> [   15.261126]   arch_timer_handler_virt+0x34/0x40
> [   15.261332]   handle_percpu_devid_irq+0x1a8/0x3c4
> [   15.261495]   __handle_domain_irq+0x7c/0xbc
> [   15.261689]   gic_handle_irq+0x48/0xac
> [   15.261881]   el1_irq+0xbc/0x180
> [   15.262024]   _raw_spin_unlock_irqrestore+0x4c/0x80
> [   15.262263]   tty_register_ldisc+0x58/0x6c
> [   15.262430]   n_tty_init+0x18/0x20
> [   15.262615]   console_init+0x20/0x3e4
> [   15.262820]   start_kernel+0x248/0x3c4
> [   15.263079] irq event stamp: 220201
> [   15.263362] hardirqs last  enabled at (220201):
> [<ffff000010e1f334>] _raw_spin_unlock_irqrestore+0x48/0x80
> [   15.263731] hardirqs last disabled at (220200):
> [<ffff000010e1f190>] _raw_spin_lock_irqsave+0x30/0x7c
> [   15.264046] softirqs last  enabled at (220196):
> [<ffff0000100f84c0>] irq_exit+0x114/0x134
> [   15.264419] softirqs last disabled at (220185):
> [<ffff0000100f84c0>] irq_exit+0x114/0x134
> [   15.264751]
> [   15.264751] other info that might help us debug this:
> [   15.265044]  Possible unsafe locking scenario:
> [   15.265044]
> [   15.265256]        CPU0
> [   15.265458]        ----
> [   15.265615]   lock(&rq->lock);
> [   15.265898]   <Interrupt>
> [   15.266087]     lock(&rq->lock);
> [   15.266353]
> [   15.266353]  *** DEADLOCK ***
> [   15.266353]
> [   15.266574] no locks held by init/1.
> [   15.266784]
> [   15.266784] stack backtrace:
> [   15.267120] CPU: 0 PID: 1 Comm: init Not tainted 5.3.0-rc3-next-20190809 #1
> [   15.267341] Hardware name: linux,dummy-virt (DT)
> [   15.267756] Call trace:
> [   15.267928]  dump_backtrace+0x0/0x140
> [   15.268159]  show_stack+0x14/0x1c
> [   15.268341]  dump_stack+0xa8/0x104
> [   15.268482]  mark_lock+0xda0/0xda8
> [   15.268728]  __lock_acquire+0x300/0x858
> [   15.268869]  lock_acquire+0x1dc/0x228
> [   15.269057]  _raw_spin_lock+0x40/0x54
> [   15.269201]  online_fair_sched_group+0x78/0xe4
> [   15.269392]  sched_online_group+0x88/0xac
> [   15.269591]  sched_autogroup_create_attach+0xcc/0x12c
> [   15.269765]  ksys_setsid+0xe8/0xec
> [   15.269990]  __arm64_sys_setsid+0xc/0x18
> [   15.270178]  el0_svc_common+0x9c/0x15c
> [   15.270317]  el0_svc_handler+0x5c/0x64
> [   15.270493]  el0_svc+0x8/0xc
> 
> https://travis-ci.com/ClangBuiltLinux/continuous-integration/jobs/223856448

While that warning certainly looks it needs to be dealt with, I just
redid the job and it boots fine; I also verified this locally. I think
the job just got stuck somewhere or the build took simply took too long
so Travis killed the job.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fwd: [CRON] Broken: ClangBuiltLinux/continuous-integration#895 (master - 2a3984b)
  2019-08-09 21:21   ` Fwd: [CRON] Broken: ClangBuiltLinux/continuous-integration#895 (master - 2a3984b) Nick Desaulniers
  2019-08-10  3:58     ` Nathan Chancellor
@ 2019-08-12 12:54     ` Will Deacon
  2019-08-12 12:55       ` Will Deacon
  1 sibling, 1 reply; 5+ messages in thread
From: Will Deacon @ 2019-08-12 12:54 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: linux-next, Stephen Rothwell, Will Deacon, Catalin Marinas,
	pauld, Linux ARM, Mark Brown, Mark Rutland, Arnd Bergmann,
	Peter Zijlstra, tglx, dietmar.eggemann

Hi Nick,

On Fri, Aug 09, 2019 at 02:21:21PM -0700, Nick Desaulniers wrote:
> Did anyone report any issue with last night's -next for arm64?
> 
> Some kind of deadlock in online_fair_sched_group.
> 
> [   15.256790] ================================
> [   15.257025] WARNING: inconsistent lock state
> [   15.257243] 5.3.0-rc3-next-20190809 #1 Not tainted
> [   15.257393] --------------------------------
> [   15.257526] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> [   15.258096] init/1 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [   15.258522] (____ptrval____) (&rq->lock){?.-.}, at:
> online_fair_sched_group+0x78/0xe4
> [   15.259170] {IN-HARDIRQ-W} state was registered at:
> [   15.259658]   lock_acquire+0x1dc/0x228
> [   15.259940]   _raw_spin_lock+0x40/0x54
> [   15.260251]   scheduler_tick+0x50/0xfc
> [   15.260491]   update_process_times+0x80/0x98
> [   15.260677]   tick_periodic+0xd8/0xf0
> [   15.260910]   tick_handle_periodic+0x30/0x94
> [   15.261126]   arch_timer_handler_virt+0x34/0x40
> [   15.261332]   handle_percpu_devid_irq+0x1a8/0x3c4
> [   15.261495]   __handle_domain_irq+0x7c/0xbc
> [   15.261689]   gic_handle_irq+0x48/0xac
> [   15.261881]   el1_irq+0xbc/0x180

Ok, so we take rq_lock() off the back of a timer interrupt in irq context...

> [   15.267928]  dump_backtrace+0x0/0x140
> [   15.268159]  show_stack+0x14/0x1c
> [   15.268341]  dump_stack+0xa8/0x104
> [   15.268482]  mark_lock+0xda0/0xda8
> [   15.268728]  __lock_acquire+0x300/0x858
> [   15.268869]  lock_acquire+0x1dc/0x228
> [   15.269057]  _raw_spin_lock+0x40/0x54

... but also with irqs enabled when handling a syscall. Boom.

> [   15.269201]  online_fair_sched_group+0x78/0xe4
> [   15.269392]  sched_online_group+0x88/0xac
> [   15.269591]  sched_autogroup_create_attach+0xcc/0x12c
> [   15.269765]  ksys_setsid+0xe8/0xec
> [   15.269990]  __arm64_sys_setsid+0xc/0x18
> [   15.270178]  el0_svc_common+0x9c/0x15c
> [   15.270317]  el0_svc_handler+0x5c/0x64
> [   15.270493]  el0_svc+0x8/0xc
> 
> https://travis-ci.com/ClangBuiltLinux/continuous-integration/jobs/223856448
> 
> Guessing related to
> commit 6b8fd01b21f5 ("sched/fair: Use rq_lock/unlock in
> online_fair_sched_group")

Agreed. I think that patch should be using rq_lock_{irqsave,irqrestore}().

Looking at the list archive, it seems that this was already spotted last
week:

  https://lkml.kernel.org/r/dfc8f652-ca98-e30a-546f-e6a2df36e33a@arm.com

Although the proposal there disables irqs unconditionally, which matches
the old behaviour (prior to 6b8fd01b21f5) but feels a bit dodgy given that
the only caller (sched_online_group()) uses the save/restore variants.

Phil -- is there a fix queued for this somewhere?

Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fwd: [CRON] Broken: ClangBuiltLinux/continuous-integration#895 (master - 2a3984b)
  2019-08-12 12:54     ` Will Deacon
@ 2019-08-12 12:55       ` Will Deacon
  2019-08-12 13:16         ` Phil Auld
  0 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2019-08-12 12:55 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: linux-next, Stephen Rothwell, Will Deacon, Catalin Marinas,
	pauld, Linux ARM, Mark Brown, Mark Rutland, Arnd Bergmann,
	Peter Zijlstra, tglx, dietmar.eggemann

On Mon, Aug 12, 2019 at 01:54:14PM +0100, Will Deacon wrote:
> Phil -- is there a fix queued for this somewhere?

Ha, tglx beat me by two minutes. This is now fixed in -tip.

Will

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fwd: [CRON] Broken: ClangBuiltLinux/continuous-integration#895 (master - 2a3984b)
  2019-08-12 12:55       ` Will Deacon
@ 2019-08-12 13:16         ` Phil Auld
  0 siblings, 0 replies; 5+ messages in thread
From: Phil Auld @ 2019-08-12 13:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: Nick Desaulniers, linux-next, Stephen Rothwell, Will Deacon,
	Catalin Marinas, Linux ARM, Mark Brown, Mark Rutland,
	Arnd Bergmann, Peter Zijlstra, tglx, dietmar.eggemann

On Mon, Aug 12, 2019 at 01:55:43PM +0100 Will Deacon wrote:
> On Mon, Aug 12, 2019 at 01:54:14PM +0100, Will Deacon wrote:
> > Phil -- is there a fix queued for this somewhere?
> 
> Ha, tglx beat me by two minutes. This is now fixed in -tip.
> 
> Will

Yeah, it's now fixed. Sorry about that...


Cheers,
Phil
-- 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-08-12 13:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <ClangBuiltLinux/continuous-integration+122566420+broken@travis-ci.com>
     [not found] ` <5d4d7164795c7_43f9afa8b58b0242711@29afa0b1-fa00-407e-a40e-a8edb471126a.mail>
2019-08-09 21:21   ` Fwd: [CRON] Broken: ClangBuiltLinux/continuous-integration#895 (master - 2a3984b) Nick Desaulniers
2019-08-10  3:58     ` Nathan Chancellor
2019-08-12 12:54     ` Will Deacon
2019-08-12 12:55       ` Will Deacon
2019-08-12 13:16         ` Phil Auld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).