linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RT BUG] isolcpus causes sleeping function called from invalid context (4.19.59-rt24)
@ 2019-08-05 10:06 Juri Lelli
  2019-08-07 20:07 ` Steven Rostedt
  0 siblings, 1 reply; 3+ messages in thread
From: Juri Lelli @ 2019-08-05 10:06 UTC (permalink / raw)
  To: linux-rt-users
  Cc: LKML, Thomas Gleixner, Sebastian Andrzej Siewior,
	Daniel Bristot de Oliveira, Clark Williams, Steven Rostedt

Hi,

Booting 4.19.59-rt24 with debug options enabled (DEBUG_ATOMIC_SLEEP) I
noticed the following splat (edited for clarity):

--->8---
 Linux version 4.19.59-rt24 (...) (...) #2 SMP PREEMPT RT Mon Aug 5 05:23:26 EDT 2019
 Command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.19.59-rt24 ... skew_tick=1 isolcpus=9-35 intel_pstate=disable nosoftlockup nohz=on nohz_full=9-35 rcu_nocbs=9-35
 ...
 smpboot: CPU0: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz (family: 0x6, model: 0x3f, stepping: 0x2)
 ...
 smp: Bringing up secondary CPUs ...
 x86: Booting SMP configuration:
 .... node  #0, CPUs:        #1
   #2
   #3
   #4
   #5
   #6
   #7
   #8
 .... node  #1, CPUs:    #9
 BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:967
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/9
 1 lock held by swapper/9/0:
  #0: 00000000eb141bd9 ((pendingb_lock).lock){+.+.}, at: queue_delayed_work_on+0x103/0x580
 irq event stamp: 0
 hardirqs last  enabled at (0): [<0000000000000000>]
 hardirqs last disabled at (0): [<ffffffffadda7bf8>] copy_process.part.18+0x1928/0x7250
 softirqs last  enabled at (0): [<ffffffffadda7c94>] copy_process.part.18+0x19c4/0x7250
 softirqs last disabled at (0): [<0000000000000000>]           (null)
 Preemption disabled at:
 [<ffffffffadcbdaef>] start_secondary+0x11f/0x530
 CPU: 9 PID: 0 Comm: swapper/9 Not tainted 4.19.59-rt24 #2
 Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/17/2018
 Call Trace:
  dump_stack+0x9a/0xf0
  ___might_sleep.cold.57+0x6a/0x7c
  rt_spin_lock+0x75/0x90
  ? queue_delayed_work_on+0x103/0x580
  queue_delayed_work_on+0x103/0x580
  sched_cpu_starting+0x28c/0x370
  ? sched_cpu_deactivate+0x2b0/0x2b0
  cpuhp_invoke_callback+0x1d0/0x1dc0
  ? _raw_spin_unlock_irqrestore+0x4a/0xf0
  notify_cpu_starting+0x117/0x190
  start_secondary+0x2be/0x530
  ? set_c+0x27c0/0x27c0
  secondary_startup_64+0xb6/0xc0
  #10
  #11
  #12
  #13
  #14
  #15
  #16
  #17
 .... node  #0, CPUs:   #18
  #19
  #20
  #21
  #22
  #23
  #24
  #25
  #26
 .... node  #1, CPUs:   #27
  #28
 BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:967
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/28
 1 lock held by swapper/28/0:
  #0: 0000000004707c3e ((pendingb_lock).lock){+.+.}, at: queue_delayed_work_on+0x103/0x580
 irq event stamp: 0
 hardirqs last  enabled at (0): [<0000000000000000>]           (null)
 hardirqs last disabled at (0): [<ffffffffadda7bf8>] copy_process.part.18+0x1928/0x7250
 softirqs last  enabled at (0): [<ffffffffadda7c94>] copy_process.part.18+0x19c4/0x0
 softirqs last disabled at (0): [<0000000000000000>]           (null)
 Preemption disabled at:
 [<ffffffffadcbdaef>] start_secondary+0x11f/0x530
 CPU: 28 PID: 0 Comm: swapper/28 Tainted: G        W         4.19.59-rt24 #2
 Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/17/2018
 Call Trace:
  dump_stack+0x9a/0xf0
  ___might_sleep.cold.57+0x6a/0x7c
  rt_spin_lock+0x75/0x90
  ? queue_delayed_work_on+0x103/0x580
  queue_delayed_work_on+0x103/0x580
  sched_cpu_starting+0x28c/0x370
  ? sched_cpu_deactivate+0x2b0/0x2b0
  cpuhp_invoke_callback+0x1d0/0x1dc0
  ? _raw_spin_unlock_irqrestore+0x4a/0xf0
  notify_cpu_starting+0x117/0x190
  start_secondary+0x2be/0x530
  ? set_cpu_sibling_map+0x27c0/0x27c0
  secondary_startup_64+0xb6/0xc0
  #29
  #30
  #31
  #32
  #33

  #35
 smp: Brought up 2 nodes, 36 CPUs
 smpboot: Max logical packages: 2
 smpboot: Total of 36 processors activated (166451.61 BogoMIPS)
--->8---

This only happens if isolcpus are configured at boot.

AFAIU, RT is reworking workqueues and 5.x-rt shouldn't suffer from this.
As a matter of fact, I could verify that backporting the workqueue
rework all-in change from 5.0-rt [1] fixes this problem.

I'm thus wondering if there is any plan on backporting the rework to
4.19-rt stable, and if that patch has dependencies, or if any alternative
fix might be found for this problem.

Thanks,

Juri

1 - https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=v5.0.19-rt11&id=d15a862f24df983458533aebd6fa207ecdd1095a

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RT BUG] isolcpus causes sleeping function called from invalid context (4.19.59-rt24)
  2019-08-05 10:06 [RT BUG] isolcpus causes sleeping function called from invalid context (4.19.59-rt24) Juri Lelli
@ 2019-08-07 20:07 ` Steven Rostedt
  2019-08-08  6:47   ` Juri Lelli
  0 siblings, 1 reply; 3+ messages in thread
From: Steven Rostedt @ 2019-08-07 20:07 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-rt-users, LKML, Thomas Gleixner, Sebastian Andrzej Siewior,
	Daniel Bristot de Oliveira, Clark Williams

On Mon, 5 Aug 2019 12:06:46 +0200
Juri Lelli <juri.lelli@redhat.com> wrote:

> This only happens if isolcpus are configured at boot.
> 
> AFAIU, RT is reworking workqueues and 5.x-rt shouldn't suffer from this.
> As a matter of fact, I could verify that backporting the workqueue
> rework all-in change from 5.0-rt [1] fixes this problem.

So you have backported this and it fixed the bug?

> 
> I'm thus wondering if there is any plan on backporting the rework to
> 4.19-rt stable, and if that patch has dependencies, or if any alternative
> fix might be found for this problem.

I could do it after I fix the bug with 4.19.63 merge :-/ (which may be
related. Who knows).

-- Steve

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RT BUG] isolcpus causes sleeping function called from invalid context (4.19.59-rt24)
  2019-08-07 20:07 ` Steven Rostedt
@ 2019-08-08  6:47   ` Juri Lelli
  0 siblings, 0 replies; 3+ messages in thread
From: Juri Lelli @ 2019-08-08  6:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-rt-users, LKML, Thomas Gleixner, Sebastian Andrzej Siewior,
	Daniel Bristot de Oliveira, Clark Williams

Hi,

On 07/08/19 16:07, Steven Rostedt wrote:
> On Mon, 5 Aug 2019 12:06:46 +0200
> Juri Lelli <juri.lelli@redhat.com> wrote:
> 
> > This only happens if isolcpus are configured at boot.
> > 
> > AFAIU, RT is reworking workqueues and 5.x-rt shouldn't suffer from this.
> > As a matter of fact, I could verify that backporting the workqueue
> > rework all-in change from 5.0-rt [1] fixes this problem.
> 
> So you have backported this and it fixed the bug?

Yeah. I did backport it to a downstream kernel and the splat is gone
(plus I couldn't spot any other problems my backport might have
introduced :).

> > I'm thus wondering if there is any plan on backporting the rework to
> > 4.19-rt stable, and if that patch has dependencies, or if any alternative
> > fix might be found for this problem.
> 
> I could do it after I fix the bug with 4.19.63 merge :-/ (which may be
> related. Who knows).

Ok, thanks!

Best,

Juri

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-08-08  6:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-05 10:06 [RT BUG] isolcpus causes sleeping function called from invalid context (4.19.59-rt24) Juri Lelli
2019-08-07 20:07 ` Steven Rostedt
2019-08-08  6:47   ` Juri Lelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).