* [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
@ 2018-08-17 20:23 Steven Rostedt
2018-08-18 10:29 ` Mike Galbraith
0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2018-08-17 20:23 UTC (permalink / raw)
To: Thomas Gleixner, Sebastian Andrzej Siewior
Cc: Julia Cartwright, linux-rt-users, LKML
Pulling in stable releases into v4.14-rt I triggered this with my CPU
hotplug test:
------------[ cut here ]------------
kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
Modules linked in: sunrpc ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd shpchp i2c_i801 soundcore floppy i915 drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit iosf_mbi video [last unloaded: speedstep_lib]
CPU: 1 PID: 2944 Comm: mkdumprd Not tainted 4.14.63-test-rt40+ #782
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
task: ffff880037888d80 task.stack: ffffc90000538000
RIP: 0010:select_fallback_rq+0xc3/0x122
RSP: 0018:ffffc9000053bae0 EFLAGS: 00010046
RAX: 0000000000000100 RBX: 0000000000000100 RCX: 0000000000000000
RDX: 0000000000000100 RSI: 0000000000000100 RDI: ffffffff81c0aac0
RBP: ffff88004e53b600 R08: 0000000000000000 R09: 0000000000000008
R10: ffffc9000053bae0 R11: 0000000000025548 R12: 0000000000000003
R13: 0000000000000002 R14: 0000000000000020 R15: ffff88004e53b600
FS: 00007f5502038700(0000) GS:ffff88007d480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001460c68 CR3: 00000000756f6000 CR4: 00000000000006e0
Call Trace:
try_to_wake_up+0x1d5/0x30a
? rt_mutex_setprio+0x1f5/0x2e3
__wake_up_q+0x47/0x6f
rt_mutex_postunlock+0x1d/0x60
rt_spin_lock_slowunlock+0x7c/0x87
rt_spin_unlock+0xa/0x1f
release_pages+0x60/0x1ef
tlb_flush_mmu_free+0x28/0x3d
arch_tlb_finish_mmu+0x39/0x5c
tlb_finish_mmu+0x1e/0x2a
exit_mmap+0xd1/0x131
__mmput+0x2f/0xbb
flush_old_exec+0x5f2/0x669
load_elf_binary+0x293/0x13f0
? _raw_spin_lock+0x13/0x1c
? trace_preempt_on+0xd/0x2a
? preempt_count_sub+0x93/0x9c
? migrate_disable+0xe5/0x12b
search_binary_handler+0x81/0x17e
do_execveat_common.isra.33+0x4d6/0x6f6
do_execve+0x1f/0x21
SyS_execve+0x28/0x2f
do_syscall_64+0x6a/0x7a
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x3054ea60b7
RSP: 002b:00007ffe63ee49a8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 0000000001460a90 RCX: 0000003054ea60b7
RDX: 0000000001460ac0 RSI: 0000000001460c70 RDI: 0000000001460a90
RBP: 0000000001460a90 R08: 0000000000000003 R09: 00000000ffffffdf
R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000001460c70 R14: 0000000001460ac0 R15: 000000000145c040
Code: 3b 05 2e 68 0e 01 89 c3 72 da 41 83 fd 01 74 1d 73 13 48 89 ef 41 bd 01 00 00 00 e8 be 7f 05 00 83 cb ff eb cd 41 83 fd 02 75 f5 <0f> 0b 48 c7 c6 40 84 15 82 48 89 ef 41 bd 02 00 00 00 e8 f4 fe
RIP: select_fallback_rq+0xc3/0x122 RSP: ffffc9000053bae0
This isn't one of my normal crashes for the cpu hotplug test. It's
triggering on this part:
static int select_fallback_rq(int cpu, struct task_struct *p)
{
[..]
for (;;) {
/* Any allowed, online CPU? */
for_each_cpu(dest_cpu, p->cpus_ptr) {
if (!is_cpu_allowed(p, dest_cpu))
continue;
goto out;
}
/* No more Mr. Nice Guy. */
switch (state) {
case cpuset:
if (IS_ENABLED(CONFIG_CPUSETS)) {
cpuset_cpus_allowed_fallback(p);
state = possible;
break;
}
/* Fall-through */
case possible:
do_set_cpus_allowed(p, cpu_possible_mask);
state = fail;
break;
case fail:
BUG(); <-- Panic here
break;
}
}
I'll investigate it a bit more, but wanted to see if you seen this too,
and if there's already a fix for it.
Thanks!
-- Steve
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
2018-08-17 20:23 [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639! Steven Rostedt
@ 2018-08-18 10:29 ` Mike Galbraith
2018-08-18 13:13 ` Mike Galbraith
0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2018-08-18 10:29 UTC (permalink / raw)
To: Steven Rostedt, Thomas Gleixner, Sebastian Andrzej Siewior
Cc: Julia Cartwright, linux-rt-users, LKML
On Fri, 2018-08-17 at 16:23 -0400, Steven Rostedt wrote:
> Pulling in stable releases into v4.14-rt I triggered this with my CPU
> hotplug test:
>
> ------------[ cut here ]------------
> kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
> invalid opcode: 0000 [#1] PREEMPT SMP PTI
> Modules linked in: sunrpc ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd shpchp i2c_i801 soundcore floppy i915 drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit iosf_mbi video [last unloaded: speedstep_lib]
> CPU: 1 PID: 2944 Comm: mkdumprd Not tainted 4.14.63-test-rt40+ #782
> Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
> task: ffff880037888d80 task.stack: ffffc90000538000
> RIP: 0010:select_fallback_rq+0xc3/0x122
I noticed this upstream, and had started hunting for the origin, but
had thought that 4.14-rt was OK. Clearly not the case, but it's not
4.14.60.. stable changes interacting badly either, virgin 4.14.59-rt37
just reproduced in a vm clone of my workstation.
-Mike
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
2018-08-18 10:29 ` Mike Galbraith
@ 2018-08-18 13:13 ` Mike Galbraith
2018-08-19 6:28 ` Mike Galbraith
0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2018-08-18 13:13 UTC (permalink / raw)
To: Steven Rostedt, Thomas Gleixner, Sebastian Andrzej Siewior
Cc: Julia Cartwright, linux-rt-users, LKML
On Sat, 2018-08-18 at 12:29 +0200, Mike Galbraith wrote:
> On Fri, 2018-08-17 at 16:23 -0400, Steven Rostedt wrote:
> > Pulling in stable releases into v4.14-rt I triggered this with my CPU
> > hotplug test:
> >
> > ------------[ cut here ]------------
> > kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
> > invalid opcode: 0000 [#1] PREEMPT SMP PTI
> > Modules linked in: sunrpc ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd shpchp i2c_i801 soundcore floppy i915 drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit iosf_mbi video [last unloaded: speedstep_lib]
> > CPU: 1 PID: 2944 Comm: mkdumprd Not tainted 4.14.63-test-rt40+ #782
> > Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
> > task: ffff880037888d80 task.stack: ffffc90000538000
> > RIP: 0010:select_fallback_rq+0xc3/0x122
>
> I noticed this upstream, and had started hunting for the origin, but
> had thought that 4.14-rt was OK. Clearly not the case, but it's not
> 4.14.60.. stable changes interacting badly either, virgin 4.14.59-rt37
> just reproduced in a vm clone of my workstation.
4.15.18-rt37 (4.14-rt rolled forward) does not reproduce, nor does
4.16.18-rt12, but 4.17.0-rt5 (v4.16.12-rt5 rolled forward) does, so
seems it has be something from the 4.17 cycle that went back to 4.14-
stable after 4.1[56]-stable trees went extinct.
-Mike
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
2018-08-18 13:13 ` Mike Galbraith
@ 2018-08-19 6:28 ` Mike Galbraith
2018-08-22 16:17 ` Steven Rostedt
0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2018-08-19 6:28 UTC (permalink / raw)
To: Steven Rostedt, Thomas Gleixner, Sebastian Andrzej Siewior
Cc: Julia Cartwright, linux-rt-users, LKML
On Sat, 2018-08-18 at 15:13 +0200, Mike Galbraith wrote:
> seems it has be something from the 4.17 cycle that went back to 4.14-
> stable after 4.1[56]-stable trees went extinct.
See ("sched/core: Require cpu_active() in select_task_rq(), for user tasks")
Fix it like so?
sched: Allow pinned user tasks to be awakened to the CPU they pinned
Since 7af443ee16976, select_fallback_rq() will BUG() if the CPU to
which a task has pinned itself and pinned becomes !cpu_active()
while it slept. Serving a 10 megaton eviction notice is neither
helpful nor required, the task will migrate when it can do so.
Signed-off-by: Mike Galbraith <efault@gmx.de>
---
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -980,7 +980,7 @@ static inline bool is_cpu_allowed(struct
if (!cpumask_test_cpu(cpu, p->cpus_ptr))
return false;
- if (is_per_cpu_kthread(p))
+ if (is_per_cpu_kthread(p) || __migrate_disabled(p))
return cpu_online(cpu);
return cpu_active(cpu);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
2018-08-19 6:28 ` Mike Galbraith
@ 2018-08-22 16:17 ` Steven Rostedt
2018-08-22 16:33 ` Steven Rostedt
0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2018-08-22 16:17 UTC (permalink / raw)
To: Mike Galbraith
Cc: Thomas Gleixner, Sebastian Andrzej Siewior, Julia Cartwright,
linux-rt-users, LKML
On Sun, 19 Aug 2018 08:28:35 +0200
Mike Galbraith <efault@gmx.de> wrote:
> On Sat, 2018-08-18 at 15:13 +0200, Mike Galbraith wrote:
> > seems it has be something from the 4.17 cycle that went back to 4.14-
> > stable after 4.1[56]-stable trees went extinct.
>
> See ("sched/core: Require cpu_active() in select_task_rq(), for user tasks")
>
> Fix it like so?
>
> sched: Allow pinned user tasks to be awakened to the CPU they pinned
>
> Since 7af443ee16976, select_fallback_rq() will BUG() if the CPU to
> which a task has pinned itself and pinned becomes !cpu_active()
> while it slept. Serving a 10 megaton eviction notice is neither
> helpful nor required, the task will migrate when it can do so.
This seems to fix the issue that I was seeing. Thanks!
I'll add this to my repo as well.
-- Steve
>
> Signed-off-by: Mike Galbraith <efault@gmx.de>
> ---
> kernel/sched/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -980,7 +980,7 @@ static inline bool is_cpu_allowed(struct
> if (!cpumask_test_cpu(cpu, p->cpus_ptr))
> return false;
>
> - if (is_per_cpu_kthread(p))
> + if (is_per_cpu_kthread(p) || __migrate_disabled(p))
> return cpu_online(cpu);
>
> return cpu_active(cpu);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
2018-08-22 16:17 ` Steven Rostedt
@ 2018-08-22 16:33 ` Steven Rostedt
2018-08-29 14:00 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2018-08-22 16:33 UTC (permalink / raw)
To: Mike Galbraith
Cc: Thomas Gleixner, Sebastian Andrzej Siewior, Julia Cartwright,
linux-rt-users, LKML
Sebastian,
On Wed, 22 Aug 2018 12:17:49 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Sun, 19 Aug 2018 08:28:35 +0200
> Mike Galbraith <efault@gmx.de> wrote:
>
> > On Sat, 2018-08-18 at 15:13 +0200, Mike Galbraith wrote:
> > > seems it has be something from the 4.17 cycle that went back to 4.14-
> > > stable after 4.1[56]-stable trees went extinct.
> >
> > See ("sched/core: Require cpu_active() in select_task_rq(), for user tasks")
> >
> > Fix it like so?
> >
> > sched: Allow pinned user tasks to be awakened to the CPU they pinned
> >
> > Since 7af443ee16976, select_fallback_rq() will BUG() if the CPU to
> > which a task has pinned itself and pinned becomes !cpu_active()
> > while it slept. Serving a 10 megaton eviction notice is neither
> > helpful nor required, the task will migrate when it can do so.
>
> This seems to fix the issue that I was seeing. Thanks!
>
> I'll add this to my repo as well.
I'm going to hold off on pulling this in until I see it in your tree,
because a stable RT branch should not carry anything that isn't in the
head RT tree. And it appears that this can affect your repo too.
-- Steve
>
> >
> > Signed-off-by: Mike Galbraith <efault@gmx.de>
> > ---
> > kernel/sched/core.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -980,7 +980,7 @@ static inline bool is_cpu_allowed(struct
> > if (!cpumask_test_cpu(cpu, p->cpus_ptr))
> > return false;
> >
> > - if (is_per_cpu_kthread(p))
> > + if (is_per_cpu_kthread(p) || __migrate_disabled(p))
> > return cpu_online(cpu);
> >
> > return cpu_active(cpu);
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639!
2018-08-22 16:33 ` Steven Rostedt
@ 2018-08-29 14:00 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-08-29 14:00 UTC (permalink / raw)
To: Steven Rostedt
Cc: Mike Galbraith, Thomas Gleixner, Julia Cartwright, linux-rt-users, LKML
On 2018-08-22 12:33:15 [-0400], Steven Rostedt wrote:
> I'm going to hold off on pulling this in until I see it in your tree,
> because a stable RT branch should not carry anything that isn't in the
> head RT tree. And it appears that this can affect your repo too.
applied.
Sebastian
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-08-29 14:00 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-17 20:23 [BUG v4.14-rt] kernel BUG at /work/rt/stable-rt.git/kernel/sched/core.c:1639! Steven Rostedt
2018-08-18 10:29 ` Mike Galbraith
2018-08-18 13:13 ` Mike Galbraith
2018-08-19 6:28 ` Mike Galbraith
2018-08-22 16:17 ` Steven Rostedt
2018-08-22 16:33 ` Steven Rostedt
2018-08-29 14:00 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).