linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
@ 2018-08-17 20:23 Steven Rostedt
  2018-08-18 10:29 ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2018-08-17 20:23 UTC (permalink / raw)
  To: Thomas Gleixner, Sebastian Andrzej Siewior
  Cc: Julia Cartwright, linux-rt-users, LKML

Pulling in stable releases into v4.14-rt I triggered this with my CPU
hotplug test:

------------[ cut here ]------------
kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
Modules linked in: sunrpc ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd shpchp i2c_i801 soundcore floppy i915 drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit iosf_mbi video [last unloaded: speedstep_lib]
CPU: 1 PID: 2944 Comm: mkdumprd Not tainted 4.14.63-test-rt40+ #782
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
task: ffff880037888d80 task.stack: ffffc90000538000
RIP: 0010:select_fallback_rq+0xc3/0x122
RSP: 0018:ffffc9000053bae0 EFLAGS: 00010046
RAX: 0000000000000100 RBX: 0000000000000100 RCX: 0000000000000000
RDX: 0000000000000100 RSI: 0000000000000100 RDI: ffffffff81c0aac0
RBP: ffff88004e53b600 R08: 0000000000000000 R09: 0000000000000008
R10: ffffc9000053bae0 R11: 0000000000025548 R12: 0000000000000003
R13: 0000000000000002 R14: 0000000000000020 R15: ffff88004e53b600
FS:  00007f5502038700(0000) GS:ffff88007d480000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001460c68 CR3: 00000000756f6000 CR4: 00000000000006e0
Call Trace:
 try_to_wake_up+0x1d5/0x30a
 ? rt_mutex_setprio+0x1f5/0x2e3
 __wake_up_q+0x47/0x6f
 rt_mutex_postunlock+0x1d/0x60
 rt_spin_lock_slowunlock+0x7c/0x87
 rt_spin_unlock+0xa/0x1f
 release_pages+0x60/0x1ef
 tlb_flush_mmu_free+0x28/0x3d
 arch_tlb_finish_mmu+0x39/0x5c
 tlb_finish_mmu+0x1e/0x2a
 exit_mmap+0xd1/0x131
 __mmput+0x2f/0xbb
 flush_old_exec+0x5f2/0x669
 load_elf_binary+0x293/0x13f0
 ? _raw_spin_lock+0x13/0x1c
 ? trace_preempt_on+0xd/0x2a
 ? preempt_count_sub+0x93/0x9c
 ? migrate_disable+0xe5/0x12b
 search_binary_handler+0x81/0x17e
 do_execveat_common.isra.33+0x4d6/0x6f6
 do_execve+0x1f/0x21
 SyS_execve+0x28/0x2f
 do_syscall_64+0x6a/0x7a
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x3054ea60b7
RSP: 002b:00007ffe63ee49a8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 0000000001460a90 RCX: 0000003054ea60b7
RDX: 0000000001460ac0 RSI: 0000000001460c70 RDI: 0000000001460a90
RBP: 0000000001460a90 R08: 0000000000000003 R09: 00000000ffffffdf
R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000001460c70 R14: 0000000001460ac0 R15: 000000000145c040
Code: 3b 05 2e 68 0e 01 89 c3 72 da 41 83 fd 01 74 1d 73 13 48 89 ef 41 bd 01 00 00 00 e8 be 7f 05 00 83 cb ff eb cd 41 83 fd 02 75 f5 <0f> 0b 48 c7 c6 40 84 15 82 48 89 ef 41 bd 02 00 00 00 e8 f4 fe 
RIP: select_fallback_rq+0xc3/0x122 RSP: ffffc9000053bae0


This isn't one of my normal crashes for the cpu hotplug test. It's
triggering on this part:

static int select_fallback_rq(int cpu, struct task_struct *p)
{

[..]

	for (;;) {
		/* Any allowed, online CPU? */
		for_each_cpu(dest_cpu, p->cpus_ptr) {
			if (!is_cpu_allowed(p, dest_cpu))
				continue;

			goto out;
		}

		/* No more Mr. Nice Guy. */
		switch (state) {
		case cpuset:
			if (IS_ENABLED(CONFIG_CPUSETS)) {
				cpuset_cpus_allowed_fallback(p);
				state = possible;
				break;
			}
			/* Fall-through */
		case possible:
			do_set_cpus_allowed(p, cpu_possible_mask);
			state = fail;
			break;

		case fail:
			BUG(); <-- Panic here
			break;
		}
	}


I'll investigate it a bit more, but wanted to see if you seen this too,
and if there's already a fix for it.

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
  2018-08-17 20:23 [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639! Steven Rostedt
@ 2018-08-18 10:29 ` Mike Galbraith
  2018-08-18 13:13   ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2018-08-18 10:29 UTC (permalink / raw)
  To: Steven Rostedt, Thomas Gleixner, Sebastian Andrzej Siewior
  Cc: Julia Cartwright, linux-rt-users, LKML

On Fri, 2018-08-17 at 16:23 -0400, Steven Rostedt wrote:
> Pulling in stable releases into v4.14-rt I triggered this with my CPU
> hotplug test:
> 
> ------------[ cut here ]------------
> kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
> invalid opcode: 0000 [#1] PREEMPT SMP PTI
> Modules linked in: sunrpc ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd shpchp i2c_i801 soundcore floppy i915 drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit iosf_mbi video [last unloaded: speedstep_lib]
> CPU: 1 PID: 2944 Comm: mkdumprd Not tainted 4.14.63-test-rt40+ #782
> Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
> task: ffff880037888d80 task.stack: ffffc90000538000
> RIP: 0010:select_fallback_rq+0xc3/0x122

I noticed this upstream, and had started hunting for the origin, but
had thought that 4.14-rt was OK.  Clearly not the case, but it's not
4.14.60.. stable changes interacting badly either, virgin 4.14.59-rt37
just reproduced in a vm clone of my workstation.

	-Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
  2018-08-18 10:29 ` Mike Galbraith
@ 2018-08-18 13:13   ` Mike Galbraith
  2018-08-19  6:28     ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2018-08-18 13:13 UTC (permalink / raw)
  To: Steven Rostedt, Thomas Gleixner, Sebastian Andrzej Siewior
  Cc: Julia Cartwright, linux-rt-users, LKML

On Sat, 2018-08-18 at 12:29 +0200, Mike Galbraith wrote:
> On Fri, 2018-08-17 at 16:23 -0400, Steven Rostedt wrote:
> > Pulling in stable releases into v4.14-rt I triggered this with my CPU
> > hotplug test:
> > 
> > ------------[ cut here ]------------
> > kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
> > invalid opcode: 0000 [#1] PREEMPT SMP PTI
> > Modules linked in: sunrpc ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd shpchp i2c_i801 soundcore floppy i915 drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit iosf_mbi video [last unloaded: speedstep_lib]
> > CPU: 1 PID: 2944 Comm: mkdumprd Not tainted 4.14.63-test-rt40+ #782
> > Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
> > task: ffff880037888d80 task.stack: ffffc90000538000
> > RIP: 0010:select_fallback_rq+0xc3/0x122
> 
> I noticed this upstream, and had started hunting for the origin, but
> had thought that 4.14-rt was OK.  Clearly not the case, but it's not
> 4.14.60.. stable changes interacting badly either, virgin 4.14.59-rt37
> just reproduced in a vm clone of my workstation.

4.15.18-rt37 (4.14-rt rolled forward) does not reproduce, nor does
4.16.18-rt12, but 4.17.0-rt5 (v4.16.12-rt5 rolled forward) does, so
seems it has be something from the 4.17 cycle that went back to 4.14-
stable after 4.1[56]-stable trees went extinct.

	-Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
  2018-08-18 13:13   ` Mike Galbraith
@ 2018-08-19  6:28     ` Mike Galbraith
  2018-08-22 16:17       ` Steven Rostedt
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2018-08-19  6:28 UTC (permalink / raw)
  To: Steven Rostedt, Thomas Gleixner, Sebastian Andrzej Siewior
  Cc: Julia Cartwright, linux-rt-users, LKML

On Sat, 2018-08-18 at 15:13 +0200, Mike Galbraith wrote:
> seems it has be something from the 4.17 cycle that went back to 4.14-
> stable after 4.1[56]-stable trees went extinct.

See ("sched/core: Require cpu_active() in select_task_rq(), for user tasks")

Fix it like so?

sched: Allow pinned user tasks to be awakened to the CPU they pinned

Since 7af443ee16976, select_fallback_rq() will BUG() if the CPU to
which a task has pinned itself and pinned becomes !cpu_active()
while it slept.  Serving a 10 megaton eviction notice is neither
helpful nor required, the task will migrate when it can do so.

Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 kernel/sched/core.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -980,7 +980,7 @@ static inline bool is_cpu_allowed(struct
 	if (!cpumask_test_cpu(cpu, p->cpus_ptr))
 		return false;
 
-	if (is_per_cpu_kthread(p))
+	if (is_per_cpu_kthread(p) || __migrate_disabled(p))
 		return cpu_online(cpu);
 
 	return cpu_active(cpu);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
  2018-08-19  6:28     ` Mike Galbraith
@ 2018-08-22 16:17       ` Steven Rostedt
  2018-08-22 16:33         ` Steven Rostedt
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2018-08-22 16:17 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Thomas Gleixner, Sebastian Andrzej Siewior, Julia Cartwright,
	linux-rt-users, LKML

On Sun, 19 Aug 2018 08:28:35 +0200
Mike Galbraith <efault@gmx.de> wrote:

> On Sat, 2018-08-18 at 15:13 +0200, Mike Galbraith wrote:
> > seems it has be something from the 4.17 cycle that went back to 4.14-
> > stable after 4.1[56]-stable trees went extinct.  
> 
> See ("sched/core: Require cpu_active() in select_task_rq(), for user tasks")
> 
> Fix it like so?
> 
> sched: Allow pinned user tasks to be awakened to the CPU they pinned
> 
> Since 7af443ee16976, select_fallback_rq() will BUG() if the CPU to
> which a task has pinned itself and pinned becomes !cpu_active()
> while it slept.  Serving a 10 megaton eviction notice is neither
> helpful nor required, the task will migrate when it can do so.

This seems to fix the issue that I was seeing. Thanks!

I'll add this to my repo as well.

-- Steve

> 
> Signed-off-by: Mike Galbraith <efault@gmx.de>
> ---
>  kernel/sched/core.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -980,7 +980,7 @@ static inline bool is_cpu_allowed(struct
>  	if (!cpumask_test_cpu(cpu, p->cpus_ptr))
>  		return false;
>  
> -	if (is_per_cpu_kthread(p))
> +	if (is_per_cpu_kthread(p) || __migrate_disabled(p))
>  		return cpu_online(cpu);
>  
>  	return cpu_active(cpu);


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
  2018-08-22 16:17       ` Steven Rostedt
@ 2018-08-22 16:33         ` Steven Rostedt
  2018-08-29 14:00           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2018-08-22 16:33 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Thomas Gleixner, Sebastian Andrzej Siewior, Julia Cartwright,
	linux-rt-users, LKML


Sebastian,


On Wed, 22 Aug 2018 12:17:49 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Sun, 19 Aug 2018 08:28:35 +0200
> Mike Galbraith <efault@gmx.de> wrote:
> 
> > On Sat, 2018-08-18 at 15:13 +0200, Mike Galbraith wrote:
> > > seems it has be something from the 4.17 cycle that went back to 4.14-
> > > stable after 4.1[56]-stable trees went extinct.  
> > 
> > See ("sched/core: Require cpu_active() in select_task_rq(), for user tasks")
> > 
> > Fix it like so?
> > 
> > sched: Allow pinned user tasks to be awakened to the CPU they pinned
> > 
> > Since 7af443ee16976, select_fallback_rq() will BUG() if the CPU to
> > which a task has pinned itself and pinned becomes !cpu_active()
> > while it slept.  Serving a 10 megaton eviction notice is neither
> > helpful nor required, the task will migrate when it can do so.
> 
> This seems to fix the issue that I was seeing. Thanks!
> 
> I'll add this to my repo as well.

I'm going to hold off on pulling this in until I see it in your tree,
because a stable RT branch should not carry anything that isn't in the
head RT tree. And it appears that this can affect your repo too.

-- Steve

> 
> > 
> > Signed-off-by: Mike Galbraith <efault@gmx.de>
> > ---
> >  kernel/sched/core.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -980,7 +980,7 @@ static inline bool is_cpu_allowed(struct
> >  	if (!cpumask_test_cpu(cpu, p->cpus_ptr))
> >  		return false;
> >  
> > -	if (is_per_cpu_kthread(p))
> > +	if (is_per_cpu_kthread(p) || __migrate_disabled(p))
> >  		return cpu_online(cpu);
> >  
> >  	return cpu_active(cpu);
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
  2018-08-22 16:33         ` Steven Rostedt
@ 2018-08-29 14:00           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-08-29 14:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mike Galbraith, Thomas Gleixner, Julia Cartwright, linux-rt-users, LKML

On 2018-08-22 12:33:15 [-0400], Steven Rostedt wrote:
> I'm going to hold off on pulling this in until I see it in your tree,
> because a stable RT branch should not carry anything that isn't in the
> head RT tree. And it appears that this can affect your repo too.

applied.

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-08-29 14:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-17 20:23 [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639! Steven Rostedt
2018-08-18 10:29 ` Mike Galbraith
2018-08-18 13:13   ` Mike Galbraith
2018-08-19  6:28     ` Mike Galbraith
2018-08-22 16:17       ` Steven Rostedt
2018-08-22 16:33         ` Steven Rostedt
2018-08-29 14:00           ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).