All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb
@ 2015-03-11 14:47 Jan Kiszka
  2015-03-11 15:12 ` Philippe Gerum
  2015-03-16 14:43 ` Philippe Gerum
  0 siblings, 2 replies; 14+ messages in thread
From: Jan Kiszka @ 2015-03-11 14:47 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai

Hi Philippe,

just happened to trigger the oops below by running

gdb --args smokey --run=8

That run already has troubles and generates different output than
running the test without gdb surveillance, probably due to unexpected
mode switches. But the real problem is that running the test again
afterwards, with or without gdb, causes the oops. Registers contain
suspicious "dead" patterns, thus we access invalid list elements. Do we
miss a cleanup when terminating smokey in the gdb session?

Jan

---

[  307.474060] [Xenomai] switching main to secondary mode after exception #13 in kernel-space at 0xffffffff8114aeb4 (pid 983)
[  307.478710] general protection fault: 0000 [#2] SMP DEBUG_PAGEALLOC
[  307.480013] Modules linked in: xt_limit 9p fscache ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables microcode rt_e1000 psmouse 9pnet_virtio serio_raw 9pnet pcspkr i2c_i801 intel_agp rtnet intel_gtt virtio_pci virtio_ring virtio ahci libahci e1000 floppy
[  307.480013] CPU: 0 PID: 983 Comm: main Tainted: G      D      3.14.28+ #228
[  307.480013] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org 04/01/2014
[  307.480013] task: ffff88003fa54a80 ti: ffff88003b064000 task.ti: ffff88003b064000
[  307.480013] RIP: 0010:[<ffffffff8114aeb4>]  [<ffffffff8114aeb4>] xnsched_tp_forget+0x14/0x50
[  307.480013] RSP: 0018:ffff88003b067dd0  EFLAGS: 00010246
[  307.480013] RAX: dead000000200200 RBX: ffff88003ad07040 RCX: ffffffff818f59f8
[  307.480013] RDX: dead000000100100 RSI: ffffffff81a1a5a0 RDI: ffff88003ad07040
[  307.480013] RBP: ffff88003b067df8 R08: 000000000000011b R09: ffffffff816ef2e1
[  307.480013] R10: 000000000000004b R11: ffff88003b067e98 R12: ffffffff81a1a5a0
[  307.480013] R13: ffff88003b067e08 R14: ffff88003ad08100 R15: ffff88003ad08100
[  307.480013] FS:  00007fda1a1c3740(0000) GS:ffff88003d400000(0000) knlGS:0000000000000000
[  307.480013] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  307.480013] CR2: 0000000002358ff8 CR3: 000000003b729000 CR4: 00000000000006f0
[  307.480013] I-pipe domain Linux
[  307.480013] Stack:
[  307.480013]  ffff88003b067df8 ffffffff81137ace dead0000000ffb58 ffff88003d6a7cc8
[  307.480013]  ffff88003d6a08c0 ffff88003b067e48 ffffffff8114b20f ffff880000000032
[  307.480013]  ffff88003ad08100 000000c3712f020f 000000000000000b 000000000000000b
[  307.480013] Call Trace:
[  307.480013]  [<ffffffff81137ace>] ? xnsched_set_policy+0x8e/0x170
[  307.480013]  [<ffffffff8114b20f>] xnsched_tp_set_schedule+0x7f/0x100
[  307.480013]  [<ffffffff8117787f>] __cobalt_sched_setconfig_np+0x2ff/0x1020
[  307.480013]  [<ffffffff81177030>] ? hard_local_irq_disable+0x30/0x30
[  307.480013]  [<ffffffff811785a0>] ? __cobalt_sched_setconfig_np+0x1020/0x1020
[  307.480013]  [<ffffffff811785bc>] cobalt_sched_setconfig_np+0x1c/0x20
[  307.480013]  [<ffffffff8117f82a>] ipipe_syscall_hook+0x16a/0x320
[  307.480013]  [<ffffffff810f77b7>] __ipipe_notify_syscall+0x97/0x3e0
[  307.480013]  [<ffffffff816d0372>] pipeline_syscall+0xa/0x17
[  307.480013] Code: 89 97 b0 05 00 00 48 89 0a 5d c3 0f 1f 40 00 b8 ea ff ff ff 5d c3 90 e8 db 51 58 00 48 8b 87 b0 05 00 00 48 8b 97 a8 05 00 00 55 <48> 89 42 08 48 89 10 48 b8 00 01 10 00 00 00 ad de 48 89 87 a8 
[  307.480013] RIP  [<ffffffff8114aeb4>] xnsched_tp_forget+0x14/0x50
[  307.480013]  RSP <ffff88003b067dd0>
[  307.541656] ---[ end trace bfb88524a477f00e ]---

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb
  2015-03-11 14:47 [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb Jan Kiszka
@ 2015-03-11 15:12 ` Philippe Gerum
  2015-03-16 14:43 ` Philippe Gerum
  1 sibling, 0 replies; 14+ messages in thread
From: Philippe Gerum @ 2015-03-11 15:12 UTC (permalink / raw)
  To: Jan Kiszka, Xenomai


Hi Jan,

On 03/11/2015 03:47 PM, Jan Kiszka wrote:
> Hi Philippe,
> 
> just happened to trigger the oops below by running
> 
> gdb --args smokey --run=8
> 
> That run already has troubles and generates different output than
> running the test without gdb surveillance, probably due to unexpected
> mode switches. But the real problem is that running the test again
> afterwards, with or without gdb, causes the oops. Registers contain
> suspicious "dead" patterns, thus we access invalid list elements. Do we
> miss a cleanup when terminating smokey in the gdb session?
> 
> [  307.480013]  [<ffffffff8114b20f>] xnsched_tp_set_schedule+0x7f/0x100
> [  307.480013]  [<ffffffff8117787f>] __cobalt_sched_setconfig_np+0x2ff/0x1020

Looks like some TCBs were left in the TP queue of the scheduler slot
after exit (xnsched->tp). Normally, xnsched_tp_forget() should pull any
exiting thread which undergoes SCHED_TP from that queue. Now, it looks
like some code somewhere does not actually share this comfortable
assumption.

I won't be able to debug until next week, but for sure this needs
fixing. Btw, I'll push a general fix for the shared heap issue (broken
inter-process binding to shared objects) when I'm back. Brown paper bag
issue.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb
  2015-03-11 14:47 [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb Jan Kiszka
  2015-03-11 15:12 ` Philippe Gerum
@ 2015-03-16 14:43 ` Philippe Gerum
  2015-03-16 15:31   ` Jan Kiszka
  1 sibling, 1 reply; 14+ messages in thread
From: Philippe Gerum @ 2015-03-16 14:43 UTC (permalink / raw)
  To: Jan Kiszka, Xenomai

On 03/11/2015 03:47 PM, Jan Kiszka wrote:
> Hi Philippe,
> 
> just happened to trigger the oops below by running
> 
> gdb --args smokey --run=8
> 
> That run already has troubles and generates different output than
> running the test without gdb surveillance, probably due to unexpected
> mode switches.

Clearly, yes. GDB causes the test program to leave primary mode, which
changes the scheduling order, and therefore the output which depends on it.

 But the real problem is that running the test again
> afterwards, with or without gdb, causes the oops. Registers contain
> suspicious "dead" patterns, thus we access invalid list elements. Do we
> miss a cleanup when terminating smokey in the gdb session?
> 

I could not reproduce this bug yet.

There is no reason for ptracing the application to have any impact on
the housekeeping chores when it exits. The backtrace shows that
xnsched_tp_set_schedule() is walking through tp->threads, which seems to
link to a stale tcb. xnsched_tp_forget() would then be called twice,
leading to the fault.

Normally, a thread that undergoes TP scheduling should be automatically
removed from tp->threads upon exit after this sequence took place:

handle_taskexit_event -> __xnthread_cleanup -> cleanup_tcb ->
xnsched_forget -> xnsched_tp_forget

For that bug to happen, either this assumption has to be wrong, or
xnsched_set_policy() is being silly at some point.

Is this 100% reproducible on your end, and does this require the initial
gdb run to show up, or would that break even when running the sched_tp
twice without gdb?

-- 
Philippe.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb
  2015-03-16 14:43 ` Philippe Gerum
@ 2015-03-16 15:31   ` Jan Kiszka
  2015-03-16 16:00     ` Jan Kiszka
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2015-03-16 15:31 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai

On 2015-03-16 15:43, Philippe Gerum wrote:
> On 03/11/2015 03:47 PM, Jan Kiszka wrote:
>> Hi Philippe,
>>
>> just happened to trigger the oops below by running
>>
>> gdb --args smokey --run=8
>>
>> That run already has troubles and generates different output than
>> running the test without gdb surveillance, probably due to unexpected
>> mode switches.
> 
> Clearly, yes. GDB causes the test program to leave primary mode, which
> changes the scheduling order, and therefore the output which depends on it.
> 
>  But the real problem is that running the test again
>> afterwards, with or without gdb, causes the oops. Registers contain
>> suspicious "dead" patterns, thus we access invalid list elements. Do we
>> miss a cleanup when terminating smokey in the gdb session?
>>
> 
> I could not reproduce this bug yet.
> 
> There is no reason for ptracing the application to have any impact on
> the housekeeping chores when it exits. The backtrace shows that
> xnsched_tp_set_schedule() is walking through tp->threads, which seems to
> link to a stale tcb. xnsched_tp_forget() would then be called twice,
> leading to the fault.
> 
> Normally, a thread that undergoes TP scheduling should be automatically
> removed from tp->threads upon exit after this sequence took place:
> 
> handle_taskexit_event -> __xnthread_cleanup -> cleanup_tcb ->
> xnsched_forget -> xnsched_tp_forget
> 
> For that bug to happen, either this assumption has to be wrong, or
> xnsched_set_policy() is being silly at some point.
> 
> Is this 100% reproducible on your end, and does this require the initial
> gdb run to show up, or would that break even when running the sched_tp
> twice without gdb?

It is always reproducible, also with current next branch. And you need
to run gdb beforehand, yes.

I'll see if I can look into details.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb
  2015-03-16 15:31   ` Jan Kiszka
@ 2015-03-16 16:00     ` Jan Kiszka
  2015-03-16 16:02       ` Jan Kiszka
  2015-03-16 16:09       ` Philippe Gerum
  0 siblings, 2 replies; 14+ messages in thread
From: Jan Kiszka @ 2015-03-16 16:00 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai

On 2015-03-16 16:31, Jan Kiszka wrote:
> On 2015-03-16 15:43, Philippe Gerum wrote:
>> On 03/11/2015 03:47 PM, Jan Kiszka wrote:
>>> Hi Philippe,
>>>
>>> just happened to trigger the oops below by running
>>>
>>> gdb --args smokey --run=8
>>>
>>> That run already has troubles and generates different output than
>>> running the test without gdb surveillance, probably due to unexpected
>>> mode switches.
>>
>> Clearly, yes. GDB causes the test program to leave primary mode, which
>> changes the scheduling order, and therefore the output which depends on it.
>>
>>  But the real problem is that running the test again
>>> afterwards, with or without gdb, causes the oops. Registers contain
>>> suspicious "dead" patterns, thus we access invalid list elements. Do we
>>> miss a cleanup when terminating smokey in the gdb session?
>>>
>>
>> I could not reproduce this bug yet.
>>
>> There is no reason for ptracing the application to have any impact on
>> the housekeeping chores when it exits. The backtrace shows that
>> xnsched_tp_set_schedule() is walking through tp->threads, which seems to
>> link to a stale tcb. xnsched_tp_forget() would then be called twice,
>> leading to the fault.
>>
>> Normally, a thread that undergoes TP scheduling should be automatically
>> removed from tp->threads upon exit after this sequence took place:
>>
>> handle_taskexit_event -> __xnthread_cleanup -> cleanup_tcb ->
>> xnsched_forget -> xnsched_tp_forget
>>
>> For that bug to happen, either this assumption has to be wrong, or
>> xnsched_set_policy() is being silly at some point.
>>
>> Is this 100% reproducible on your end, and does this require the initial
>> gdb run to show up, or would that break even when running the sched_tp
>> twice without gdb?
> 
> It is always reproducible, also with current next branch. And you need
> to run gdb beforehand, yes.
> 
> I'll see if I can look into details.

During cleanup of the first run under gdb, I get this one as expected
(and two more hits for thread and C):

Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
175     {
(gdb) p thread->name
$3 = "threadA", '\000' <repeats 24 times>
(gdb) bt
#0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
#1  0xffffffff8114b19f in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
#2  cleanup_tcb (thread=<optimized out>) at ../kernel/xenomai/thread.c:467
#3  __xnthread_cleanup (curr=0xffff88003ad07040) at ../kernel/xenomai/thread.c:486
#4  0xffffffff811794fd in handle_taskexit_event (p=<optimized out>) at ../kernel/xenomai/posix/process.c:1028
#5  0xffffffff8117b49d in ipipe_kevent_hook (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/xenomai/posix/process.c:1228
#6  0xffffffff810fc6d1 in __ipipe_notify_kevent (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/ipipe/core.c:1092
#7  0xffffffff81050702 in do_exit (code=0) at ../kernel/exit.c:717
#8  0xffffffff810518a7 in SYSC_exit (error_code=<optimized out>) at ../kernel/exit.c:855
#9  SyS_exit (error_code=<optimized out>) at ../kernel/exit.c:853
#10 <signal handler called>
#11 0x00007ffff7354146 in ?? ()
#12 0xffff88003cfcde10 in ?? ()
#13 0xffffffff81a09260 in ?? ()
#14 0x0000000000000000 in ?? ()
(gdb) c
Continuing.


But then, when I start the test again (with or without gdb), I also get
this right at the beginning:


Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
175     {
(gdb) bt
#0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
#1  0xffffffff8113ebae in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
#2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffffffff81a2bbe0 <xnsched_class_rt>, p=0xffff88003b813e00) at ../kernel/xenomai/sched.c:403
#3  0xffffffff8115184f in xnsched_tp_set_schedule (sched=0xffff88003ad07040, gps=0xffff88003ad08080) at ../kernel/xenomai/sched-tp.c:260
#4  0xffffffff8117c5df in set_tp_config (len=<optimized out>, config=<optimized out>, cpu=<optimized out>) at ../kernel/xenomai/posix/sched.c:284
#5  __cobalt_sched_setconfig_np (cpu=<optimized out>, policy=11, u_config=<optimized out>, len=168, fetch_config=<optimized out>, ack_config=<optimized out>) at ../kernel/xenomai/posix/sched.c:617
#6  0xffffffff8117d31c in cobalt_sched_setconfig_np (cpu=<optimized out>, policy=<optimized out>, u_config=<optimized out>, len=<optimized out>) at ../kernel/xenomai/posix/sched.c:639
#7  0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>, regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
#8  ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003b813f58) at ../kernel/xenomai/posix/syscall.c:1107
#9  0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at ../kernel/ipipe/core.c:1006
#10 <signal handler called>
#11 0x00007f8d2f9d12c0 in ?? ()
Backtrace stopped: Cannot access memory at address 0x20040


Any bell ringing on your side?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb
  2015-03-16 16:00     ` Jan Kiszka
@ 2015-03-16 16:02       ` Jan Kiszka
  2015-03-16 16:09       ` Philippe Gerum
  1 sibling, 0 replies; 14+ messages in thread
From: Jan Kiszka @ 2015-03-16 16:02 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai

On 2015-03-16 17:00, Jan Kiszka wrote:
> On 2015-03-16 16:31, Jan Kiszka wrote:
>> On 2015-03-16 15:43, Philippe Gerum wrote:
>>> On 03/11/2015 03:47 PM, Jan Kiszka wrote:
>>>> Hi Philippe,
>>>>
>>>> just happened to trigger the oops below by running
>>>>
>>>> gdb --args smokey --run=8
>>>>
>>>> That run already has troubles and generates different output than
>>>> running the test without gdb surveillance, probably due to unexpected
>>>> mode switches.
>>>
>>> Clearly, yes. GDB causes the test program to leave primary mode, which
>>> changes the scheduling order, and therefore the output which depends on it.
>>>
>>>  But the real problem is that running the test again
>>>> afterwards, with or without gdb, causes the oops. Registers contain
>>>> suspicious "dead" patterns, thus we access invalid list elements. Do we
>>>> miss a cleanup when terminating smokey in the gdb session?
>>>>
>>>
>>> I could not reproduce this bug yet.
>>>
>>> There is no reason for ptracing the application to have any impact on
>>> the housekeeping chores when it exits. The backtrace shows that
>>> xnsched_tp_set_schedule() is walking through tp->threads, which seems to
>>> link to a stale tcb. xnsched_tp_forget() would then be called twice,
>>> leading to the fault.
>>>
>>> Normally, a thread that undergoes TP scheduling should be automatically
>>> removed from tp->threads upon exit after this sequence took place:
>>>
>>> handle_taskexit_event -> __xnthread_cleanup -> cleanup_tcb ->
>>> xnsched_forget -> xnsched_tp_forget
>>>
>>> For that bug to happen, either this assumption has to be wrong, or
>>> xnsched_set_policy() is being silly at some point.
>>>
>>> Is this 100% reproducible on your end, and does this require the initial
>>> gdb run to show up, or would that break even when running the sched_tp
>>> twice without gdb?
>>
>> It is always reproducible, also with current next branch. And you need
>> to run gdb beforehand, yes.
>>
>> I'll see if I can look into details.
> 
> During cleanup of the first run under gdb, I get this one as expected
> (and two more hits for thread and C):
> 
> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
> 175     {
> (gdb) p thread->name
> $3 = "threadA", '\000' <repeats 24 times>
> (gdb) bt
> #0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
> #1  0xffffffff8114b19f in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
> #2  cleanup_tcb (thread=<optimized out>) at ../kernel/xenomai/thread.c:467
> #3  __xnthread_cleanup (curr=0xffff88003ad07040) at ../kernel/xenomai/thread.c:486
> #4  0xffffffff811794fd in handle_taskexit_event (p=<optimized out>) at ../kernel/xenomai/posix/process.c:1028
> #5  0xffffffff8117b49d in ipipe_kevent_hook (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/xenomai/posix/process.c:1228
> #6  0xffffffff810fc6d1 in __ipipe_notify_kevent (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/ipipe/core.c:1092
> #7  0xffffffff81050702 in do_exit (code=0) at ../kernel/exit.c:717
> #8  0xffffffff810518a7 in SYSC_exit (error_code=<optimized out>) at ../kernel/exit.c:855
> #9  SyS_exit (error_code=<optimized out>) at ../kernel/exit.c:853
> #10 <signal handler called>
> #11 0x00007ffff7354146 in ?? ()
> #12 0xffff88003cfcde10 in ?? ()
> #13 0xffffffff81a09260 in ?? ()
> #14 0x0000000000000000 in ?? ()
> (gdb) c
> Continuing.
> 
> 
> But then, when I start the test again (with or without gdb), I also get
> this right at the beginning:
> 
> 
> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175

Forgot to print: thread->name is "threadA" here.

> 175     {
> (gdb) bt
> #0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
> #1  0xffffffff8113ebae in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
> #2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffffffff81a2bbe0 <xnsched_class_rt>, p=0xffff88003b813e00) at ../kernel/xenomai/sched.c:403
> #3  0xffffffff8115184f in xnsched_tp_set_schedule (sched=0xffff88003ad07040, gps=0xffff88003ad08080) at ../kernel/xenomai/sched-tp.c:260
> #4  0xffffffff8117c5df in set_tp_config (len=<optimized out>, config=<optimized out>, cpu=<optimized out>) at ../kernel/xenomai/posix/sched.c:284
> #5  __cobalt_sched_setconfig_np (cpu=<optimized out>, policy=11, u_config=<optimized out>, len=168, fetch_config=<optimized out>, ack_config=<optimized out>) at ../kernel/xenomai/posix/sched.c:617
> #6  0xffffffff8117d31c in cobalt_sched_setconfig_np (cpu=<optimized out>, policy=<optimized out>, u_config=<optimized out>, len=<optimized out>) at ../kernel/xenomai/posix/sched.c:639
> #7  0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>, regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
> #8  ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003b813f58) at ../kernel/xenomai/posix/syscall.c:1107
> #9  0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at ../kernel/ipipe/core.c:1006
> #10 <signal handler called>
> #11 0x00007f8d2f9d12c0 in ?? ()
> Backtrace stopped: Cannot access memory at address 0x20040
> 
> 
> Any bell ringing on your side?
> 
> Jan
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb
  2015-03-16 16:00     ` Jan Kiszka
  2015-03-16 16:02       ` Jan Kiszka
@ 2015-03-16 16:09       ` Philippe Gerum
  2015-03-16 16:42         ` Jan Kiszka
  1 sibling, 1 reply; 14+ messages in thread
From: Philippe Gerum @ 2015-03-16 16:09 UTC (permalink / raw)
  To: Jan Kiszka, Xenomai

On 03/16/2015 05:00 PM, Jan Kiszka wrote:
> On 2015-03-16 16:31, Jan Kiszka wrote:
>> On 2015-03-16 15:43, Philippe Gerum wrote:
>>> On 03/11/2015 03:47 PM, Jan Kiszka wrote:
>>>> Hi Philippe,
>>>>
>>>> just happened to trigger the oops below by running
>>>>
>>>> gdb --args smokey --run=8
>>>>
>>>> That run already has troubles and generates different output than
>>>> running the test without gdb surveillance, probably due to unexpected
>>>> mode switches.
>>>
>>> Clearly, yes. GDB causes the test program to leave primary mode, which
>>> changes the scheduling order, and therefore the output which depends on it.
>>>
>>>  But the real problem is that running the test again
>>>> afterwards, with or without gdb, causes the oops. Registers contain
>>>> suspicious "dead" patterns, thus we access invalid list elements. Do we
>>>> miss a cleanup when terminating smokey in the gdb session?
>>>>
>>>
>>> I could not reproduce this bug yet.
>>>
>>> There is no reason for ptracing the application to have any impact on
>>> the housekeeping chores when it exits. The backtrace shows that
>>> xnsched_tp_set_schedule() is walking through tp->threads, which seems to
>>> link to a stale tcb. xnsched_tp_forget() would then be called twice,
>>> leading to the fault.
>>>
>>> Normally, a thread that undergoes TP scheduling should be automatically
>>> removed from tp->threads upon exit after this sequence took place:
>>>
>>> handle_taskexit_event -> __xnthread_cleanup -> cleanup_tcb ->
>>> xnsched_forget -> xnsched_tp_forget
>>>
>>> For that bug to happen, either this assumption has to be wrong, or
>>> xnsched_set_policy() is being silly at some point.
>>>
>>> Is this 100% reproducible on your end, and does this require the initial
>>> gdb run to show up, or would that break even when running the sched_tp
>>> twice without gdb?
>>
>> It is always reproducible, also with current next branch. And you need
>> to run gdb beforehand, yes.
>>
>> I'll see if I can look into details.
> 
> During cleanup of the first run under gdb, I get this one as expected
> (and two more hits for thread and C):
> 
> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
> 175     {
> (gdb) p thread->name
> $3 = "threadA", '\000' <repeats 24 times>
> (gdb) bt
> #0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
> #1  0xffffffff8114b19f in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
> #2  cleanup_tcb (thread=<optimized out>) at ../kernel/xenomai/thread.c:467
> #3  __xnthread_cleanup (curr=0xffff88003ad07040) at ../kernel/xenomai/thread.c:486
> #4  0xffffffff811794fd in handle_taskexit_event (p=<optimized out>) at ../kernel/xenomai/posix/process.c:1028
> #5  0xffffffff8117b49d in ipipe_kevent_hook (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/xenomai/posix/process.c:1228
> #6  0xffffffff810fc6d1 in __ipipe_notify_kevent (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/ipipe/core.c:1092
> #7  0xffffffff81050702 in do_exit (code=0) at ../kernel/exit.c:717
> #8  0xffffffff810518a7 in SYSC_exit (error_code=<optimized out>) at ../kernel/exit.c:855
> #9  SyS_exit (error_code=<optimized out>) at ../kernel/exit.c:853
> #10 <signal handler called>
> #11 0x00007ffff7354146 in ?? ()
> #12 0xffff88003cfcde10 in ?? ()
> #13 0xffffffff81a09260 in ?? ()
> #14 0x0000000000000000 in ?? ()
> (gdb) c
> Continuing.
> 
> 
> But then, when I start the test again (with or without gdb), I also get
> this right at the beginning:
> 
> 
> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
> 175     {
> (gdb) bt
> #0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
> #1  0xffffffff8113ebae in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
> #2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffffffff81a2bbe0 <xnsched_class_rt>, p=0xffff88003b813e00) at ../kernel/xenomai/sched.c:403
> #3  0xffffffff8115184f in xnsched_tp_set_schedule (sched=0xffff88003ad07040, gps=0xffff88003ad08080) at ../kernel/xenomai/sched-tp.c:260
> #4  0xffffffff8117c5df in set_tp_config (len=<optimized out>, config=<optimized out>, cpu=<optimized out>) at ../kernel/xenomai/posix/sched.c:284

Yes, this one is the weird one. Normally, we should not find any TCB
lingering in tp->threads, once threads A, B and C have exited and
unlinked from it via xnsched_forget().

That call on behalf of xnsched_tp_set_schedule() is aimed at moving all
threads currently undergoing a TP schedule to the RT class, since we are
about to change the scheduling data (i.e. time windows and partitions).
Why tp->threads is not empty when running xnsched_tp_set_schedule() at
the next program invocation is what needs to be explained.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb
  2015-03-16 16:09       ` Philippe Gerum
@ 2015-03-16 16:42         ` Jan Kiszka
  2015-03-16 17:16           ` Jan Kiszka
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2015-03-16 16:42 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai

On 2015-03-16 17:09, Philippe Gerum wrote:
> On 03/16/2015 05:00 PM, Jan Kiszka wrote:
>> On 2015-03-16 16:31, Jan Kiszka wrote:
>>> On 2015-03-16 15:43, Philippe Gerum wrote:
>>>> On 03/11/2015 03:47 PM, Jan Kiszka wrote:
>>>>> Hi Philippe,
>>>>>
>>>>> just happened to trigger the oops below by running
>>>>>
>>>>> gdb --args smokey --run=8
>>>>>
>>>>> That run already has troubles and generates different output than
>>>>> running the test without gdb surveillance, probably due to unexpected
>>>>> mode switches.
>>>>
>>>> Clearly, yes. GDB causes the test program to leave primary mode, which
>>>> changes the scheduling order, and therefore the output which depends on it.
>>>>
>>>>  But the real problem is that running the test again
>>>>> afterwards, with or without gdb, causes the oops. Registers contain
>>>>> suspicious "dead" patterns, thus we access invalid list elements. Do we
>>>>> miss a cleanup when terminating smokey in the gdb session?
>>>>>
>>>>
>>>> I could not reproduce this bug yet.
>>>>
>>>> There is no reason for ptracing the application to have any impact on
>>>> the housekeeping chores when it exits. The backtrace shows that
>>>> xnsched_tp_set_schedule() is walking through tp->threads, which seems to
>>>> link to a stale tcb. xnsched_tp_forget() would then be called twice,
>>>> leading to the fault.
>>>>
>>>> Normally, a thread that undergoes TP scheduling should be automatically
>>>> removed from tp->threads upon exit after this sequence took place:
>>>>
>>>> handle_taskexit_event -> __xnthread_cleanup -> cleanup_tcb ->
>>>> xnsched_forget -> xnsched_tp_forget
>>>>
>>>> For that bug to happen, either this assumption has to be wrong, or
>>>> xnsched_set_policy() is being silly at some point.
>>>>
>>>> Is this 100% reproducible on your end, and does this require the initial
>>>> gdb run to show up, or would that break even when running the sched_tp
>>>> twice without gdb?
>>>
>>> It is always reproducible, also with current next branch. And you need
>>> to run gdb beforehand, yes.
>>>
>>> I'll see if I can look into details.
>>
>> During cleanup of the first run under gdb, I get this one as expected
>> (and two more hits for thread and C):
>>
>> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>> 175     {
>> (gdb) p thread->name
>> $3 = "threadA", '\000' <repeats 24 times>
>> (gdb) bt
>> #0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>> #1  0xffffffff8114b19f in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
>> #2  cleanup_tcb (thread=<optimized out>) at ../kernel/xenomai/thread.c:467
>> #3  __xnthread_cleanup (curr=0xffff88003ad07040) at ../kernel/xenomai/thread.c:486
>> #4  0xffffffff811794fd in handle_taskexit_event (p=<optimized out>) at ../kernel/xenomai/posix/process.c:1028
>> #5  0xffffffff8117b49d in ipipe_kevent_hook (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/xenomai/posix/process.c:1228
>> #6  0xffffffff810fc6d1 in __ipipe_notify_kevent (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/ipipe/core.c:1092
>> #7  0xffffffff81050702 in do_exit (code=0) at ../kernel/exit.c:717
>> #8  0xffffffff810518a7 in SYSC_exit (error_code=<optimized out>) at ../kernel/exit.c:855
>> #9  SyS_exit (error_code=<optimized out>) at ../kernel/exit.c:853
>> #10 <signal handler called>
>> #11 0x00007ffff7354146 in ?? ()
>> #12 0xffff88003cfcde10 in ?? ()
>> #13 0xffffffff81a09260 in ?? ()
>> #14 0x0000000000000000 in ?? ()
>> (gdb) c
>> Continuing.
>>
>>
>> But then, when I start the test again (with or without gdb), I also get
>> this right at the beginning:
>>
>>
>> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>> 175     {
>> (gdb) bt
>> #0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>> #1  0xffffffff8113ebae in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
>> #2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffffffff81a2bbe0 <xnsched_class_rt>, p=0xffff88003b813e00) at ../kernel/xenomai/sched.c:403
>> #3  0xffffffff8115184f in xnsched_tp_set_schedule (sched=0xffff88003ad07040, gps=0xffff88003ad08080) at ../kernel/xenomai/sched-tp.c:260
>> #4  0xffffffff8117c5df in set_tp_config (len=<optimized out>, config=<optimized out>, cpu=<optimized out>) at ../kernel/xenomai/posix/sched.c:284
> 
> Yes, this one is the weird one. Normally, we should not find any TCB
> lingering in tp->threads, once threads A, B and C have exited and
> unlinked from it via xnsched_forget().
> 
> That call on behalf of xnsched_tp_set_schedule() is aimed at moving all
> threads currently undergoing a TP schedule to the RT class, since we are
> about to change the scheduling data (i.e. time windows and partitions).
> Why tp->threads is not empty when running xnsched_tp_set_schedule() at
> the next program invocation is what needs to be explained.

Under gdb supervision, some threads are added to the tp pool multiple
times:

Breakpoint 1, xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
160     {
(gdb) bt
#0  xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
#1  0xffffffff8113eb45 in xnsched_declare (p=<optimized out>, thread=<optimized out>, sched_class=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:540
#2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffff88003a7b7e20, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched.c:390
#3  0xffffffff8114b7cf in __xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=<optimized out>, sched_param=<optimized out>) at ../kernel/xenomai/thread.c:1802
#4  0xffffffff8114b8cc in xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=0xffffffff81a2e400 <xnsched_class_tp>, sched_param=0xffff88003a7b7e20) at ../kernel/xenomai/thread.c:1772
#5  0xffffffff8118c2c3 in pthread_setschedparam_ex (param_ex=<optimized out>, policy=<optimized out>, thread=<optimized out>) at ../kernel/xenomai/posix/thread.c:271
#6  __cobalt_thread_setschedparam_ex (pth=<optimized out>, policy=<optimized out>, param_ex=<optimized out>, u_winoff=<optimized out>, u_promoted=<optimized out>) at ../kernel/xenomai/posix/thread.c:481
#7  0xffffffff8118c614 in cobalt_thread_setschedparam_ex (pth=140737352652544, policy=11, u_param=<optimized out>, u_winoff=0x7ffff7e94bec, u_promoted=0x7ffff7e94be8) at ../kernel/xenomai/posix/thread.c:509
#8  0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>, regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
#9  ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003a7b7f58) at ../kernel/xenomai/posix/syscall.c:1107
#10 0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at ../kernel/ipipe/core.c:1006
#11 <signal handler called>
#12 0x00007ffff77b7d25 in ?? ()
#13 0x5a5a5a5a00000001 in ?? ()
#14 0x0000000000000000 in ?? ()
(gdb) c
Continuing.

Breakpoint 1, xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
160     {
(gdb) bt
#0  xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
#1  0xffffffff8113eb45 in xnsched_declare (p=<optimized out>, thread=<optimized out>, sched_class=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:540
#2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffff88003a7b7e20, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched.c:390
#3  0xffffffff8114b7cf in __xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=<optimized out>, sched_param=<optimized out>) at ../kernel/xenomai/thread.c:1802
#4  0xffffffff8114b8cc in xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=0xffffffff81a2e400 <xnsched_class_tp>, sched_param=0xffff88003a7b7e20) at ../kernel/xenomai/thread.c:1772
#5  0xffffffff8118c2c3 in pthread_setschedparam_ex (param_ex=<optimized out>, policy=<optimized out>, thread=<optimized out>) at ../kernel/xenomai/posix/thread.c:271
#6  __cobalt_thread_setschedparam_ex (pth=<optimized out>, policy=<optimized out>, param_ex=<optimized out>, u_winoff=<optimized out>, u_promoted=<optimized out>) at ../kernel/xenomai/posix/thread.c:481
#7  0xffffffff8118c614 in cobalt_thread_setschedparam_ex (pth=140737352652544, policy=11, u_param=<optimized out>, u_winoff=0x7ffff7e94bec, u_promoted=0x7ffff7e94be8) at ../kernel/xenomai/posix/thread.c:509
#8  0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>, regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
#9  ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003a7b7f58) at ../kernel/xenomai/posix/syscall.c:1107
#10 0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at ../kernel/ipipe/core.c:1006
#11 <signal handler called>
#12 0x00007ffff77b7d25 in ?? ()
#13 0x5a5a5a5a00000001 in ?? ()
#14 0x0000000000000000 in ?? ()


Could it be that some syscall is restarted after being interrupted by
ptrace?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb
  2015-03-16 16:42         ` Jan Kiszka
@ 2015-03-16 17:16           ` Jan Kiszka
  2015-03-16 19:24             ` Philippe Gerum
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2015-03-16 17:16 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai

On 2015-03-16 17:42, Jan Kiszka wrote:
> On 2015-03-16 17:09, Philippe Gerum wrote:
>> On 03/16/2015 05:00 PM, Jan Kiszka wrote:
>>> On 2015-03-16 16:31, Jan Kiszka wrote:
>>>> On 2015-03-16 15:43, Philippe Gerum wrote:
>>>>> On 03/11/2015 03:47 PM, Jan Kiszka wrote:
>>>>>> Hi Philippe,
>>>>>>
>>>>>> just happened to trigger the oops below by running
>>>>>>
>>>>>> gdb --args smokey --run=8
>>>>>>
>>>>>> That run already has troubles and generates different output than
>>>>>> running the test without gdb surveillance, probably due to unexpected
>>>>>> mode switches.
>>>>>
>>>>> Clearly, yes. GDB causes the test program to leave primary mode, which
>>>>> changes the scheduling order, and therefore the output which depends on it.
>>>>>
>>>>>  But the real problem is that running the test again
>>>>>> afterwards, with or without gdb, causes the oops. Registers contain
>>>>>> suspicious "dead" patterns, thus we access invalid list elements. Do we
>>>>>> miss a cleanup when terminating smokey in the gdb session?
>>>>>>
>>>>>
>>>>> I could not reproduce this bug yet.
>>>>>
>>>>> There is no reason for ptracing the application to have any impact on
>>>>> the housekeeping chores when it exits. The backtrace shows that
>>>>> xnsched_tp_set_schedule() is walking through tp->threads, which seems to
>>>>> link to a stale tcb. xnsched_tp_forget() would then be called twice,
>>>>> leading to the fault.
>>>>>
>>>>> Normally, a thread that undergoes TP scheduling should be automatically
>>>>> removed from tp->threads upon exit after this sequence took place:
>>>>>
>>>>> handle_taskexit_event -> __xnthread_cleanup -> cleanup_tcb ->
>>>>> xnsched_forget -> xnsched_tp_forget
>>>>>
>>>>> For that bug to happen, either this assumption has to be wrong, or
>>>>> xnsched_set_policy() is being silly at some point.
>>>>>
>>>>> Is this 100% reproducible on your end, and does this require the initial
>>>>> gdb run to show up, or would that break even when running the sched_tp
>>>>> twice without gdb?
>>>>
>>>> It is always reproducible, also with current next branch. And you need
>>>> to run gdb beforehand, yes.
>>>>
>>>> I'll see if I can look into details.
>>>
>>> During cleanup of the first run under gdb, I get this one as expected
>>> (and two more hits for thread and C):
>>>
>>> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>> 175     {
>>> (gdb) p thread->name
>>> $3 = "threadA", '\000' <repeats 24 times>
>>> (gdb) bt
>>> #0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>> #1  0xffffffff8114b19f in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
>>> #2  cleanup_tcb (thread=<optimized out>) at ../kernel/xenomai/thread.c:467
>>> #3  __xnthread_cleanup (curr=0xffff88003ad07040) at ../kernel/xenomai/thread.c:486
>>> #4  0xffffffff811794fd in handle_taskexit_event (p=<optimized out>) at ../kernel/xenomai/posix/process.c:1028
>>> #5  0xffffffff8117b49d in ipipe_kevent_hook (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/xenomai/posix/process.c:1228
>>> #6  0xffffffff810fc6d1 in __ipipe_notify_kevent (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/ipipe/core.c:1092
>>> #7  0xffffffff81050702 in do_exit (code=0) at ../kernel/exit.c:717
>>> #8  0xffffffff810518a7 in SYSC_exit (error_code=<optimized out>) at ../kernel/exit.c:855
>>> #9  SyS_exit (error_code=<optimized out>) at ../kernel/exit.c:853
>>> #10 <signal handler called>
>>> #11 0x00007ffff7354146 in ?? ()
>>> #12 0xffff88003cfcde10 in ?? ()
>>> #13 0xffffffff81a09260 in ?? ()
>>> #14 0x0000000000000000 in ?? ()
>>> (gdb) c
>>> Continuing.
>>>
>>>
>>> But then, when I start the test again (with or without gdb), I also get
>>> this right at the beginning:
>>>
>>>
>>> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>> 175     {
>>> (gdb) bt
>>> #0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>> #1  0xffffffff8113ebae in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
>>> #2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffffffff81a2bbe0 <xnsched_class_rt>, p=0xffff88003b813e00) at ../kernel/xenomai/sched.c:403
>>> #3  0xffffffff8115184f in xnsched_tp_set_schedule (sched=0xffff88003ad07040, gps=0xffff88003ad08080) at ../kernel/xenomai/sched-tp.c:260
>>> #4  0xffffffff8117c5df in set_tp_config (len=<optimized out>, config=<optimized out>, cpu=<optimized out>) at ../kernel/xenomai/posix/sched.c:284
>>
>> Yes, this one is the weird one. Normally, we should not find any TCB
>> lingering in tp->threads, once threads A, B and C have exited and
>> unlinked from it via xnsched_forget().
>>
>> That call on behalf of xnsched_tp_set_schedule() is aimed at moving all
>> threads currently undergoing a TP schedule to the RT class, since we are
>> about to change the scheduling data (i.e. time windows and partitions).
>> Why tp->threads is not empty when running xnsched_tp_set_schedule() at
>> the next program invocation is what needs to be explained.
> 
> Under gdb supervision, some threads are added to the tp pool multiple
> times:
> 
> Breakpoint 1, xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
> 160     {
> (gdb) bt
> #0  xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
> #1  0xffffffff8113eb45 in xnsched_declare (p=<optimized out>, thread=<optimized out>, sched_class=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:540
> #2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffff88003a7b7e20, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched.c:390
> #3  0xffffffff8114b7cf in __xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=<optimized out>, sched_param=<optimized out>) at ../kernel/xenomai/thread.c:1802
> #4  0xffffffff8114b8cc in xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=0xffffffff81a2e400 <xnsched_class_tp>, sched_param=0xffff88003a7b7e20) at ../kernel/xenomai/thread.c:1772
> #5  0xffffffff8118c2c3 in pthread_setschedparam_ex (param_ex=<optimized out>, policy=<optimized out>, thread=<optimized out>) at ../kernel/xenomai/posix/thread.c:271
> #6  __cobalt_thread_setschedparam_ex (pth=<optimized out>, policy=<optimized out>, param_ex=<optimized out>, u_winoff=<optimized out>, u_promoted=<optimized out>) at ../kernel/xenomai/posix/thread.c:481
> #7  0xffffffff8118c614 in cobalt_thread_setschedparam_ex (pth=140737352652544, policy=11, u_param=<optimized out>, u_winoff=0x7ffff7e94bec, u_promoted=0x7ffff7e94be8) at ../kernel/xenomai/posix/thread.c:509
> #8  0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>, regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
> #9  ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003a7b7f58) at ../kernel/xenomai/posix/syscall.c:1107
> #10 0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at ../kernel/ipipe/core.c:1006
> #11 <signal handler called>
> #12 0x00007ffff77b7d25 in ?? ()
> #13 0x5a5a5a5a00000001 in ?? ()
> #14 0x0000000000000000 in ?? ()
> (gdb) c
> Continuing.
> 
> Breakpoint 1, xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
> 160     {
> (gdb) bt
> #0  xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
> #1  0xffffffff8113eb45 in xnsched_declare (p=<optimized out>, thread=<optimized out>, sched_class=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:540
> #2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffff88003a7b7e20, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched.c:390
> #3  0xffffffff8114b7cf in __xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=<optimized out>, sched_param=<optimized out>) at ../kernel/xenomai/thread.c:1802
> #4  0xffffffff8114b8cc in xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=0xffffffff81a2e400 <xnsched_class_tp>, sched_param=0xffff88003a7b7e20) at ../kernel/xenomai/thread.c:1772
> #5  0xffffffff8118c2c3 in pthread_setschedparam_ex (param_ex=<optimized out>, policy=<optimized out>, thread=<optimized out>) at ../kernel/xenomai/posix/thread.c:271
> #6  __cobalt_thread_setschedparam_ex (pth=<optimized out>, policy=<optimized out>, param_ex=<optimized out>, u_winoff=<optimized out>, u_promoted=<optimized out>) at ../kernel/xenomai/posix/thread.c:481
> #7  0xffffffff8118c614 in cobalt_thread_setschedparam_ex (pth=140737352652544, policy=11, u_param=<optimized out>, u_winoff=0x7ffff7e94bec, u_promoted=0x7ffff7e94be8) at ../kernel/xenomai/posix/thread.c:509
> #8  0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>, regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
> #9  ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003a7b7f58) at ../kernel/xenomai/posix/syscall.c:1107
> #10 0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at ../kernel/ipipe/core.c:1006
> #11 <signal handler called>
> #12 0x00007ffff77b7d25 in ?? ()
> #13 0x5a5a5a5a00000001 in ?? ()
> #14 0x0000000000000000 in ?? ()
> 
> 
> Could it be that some syscall is restarted after being interrupted by
> ptrace?

Don't understand the reason for this multiple calls in details yet, but
you can also trigger the bug by invoking pthread_setschedparam_ex twice
in smokey's sched-tp.c. Fix pushed to for-forge.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb
  2015-03-16 17:16           ` Jan Kiszka
@ 2015-03-16 19:24             ` Philippe Gerum
  2015-03-16 19:35               ` Jan Kiszka
  0 siblings, 1 reply; 14+ messages in thread
From: Philippe Gerum @ 2015-03-16 19:24 UTC (permalink / raw)
  To: Jan Kiszka, Xenomai

On 03/16/2015 06:16 PM, Jan Kiszka wrote:
> On 2015-03-16 17:42, Jan Kiszka wrote:
>> On 2015-03-16 17:09, Philippe Gerum wrote:
>>> On 03/16/2015 05:00 PM, Jan Kiszka wrote:
>>>> On 2015-03-16 16:31, Jan Kiszka wrote:
>>>>> On 2015-03-16 15:43, Philippe Gerum wrote:
>>>>>> On 03/11/2015 03:47 PM, Jan Kiszka wrote:
>>>>>>> Hi Philippe,
>>>>>>>
>>>>>>> just happened to trigger the oops below by running
>>>>>>>
>>>>>>> gdb --args smokey --run=8
>>>>>>>
>>>>>>> That run already has troubles and generates different output than
>>>>>>> running the test without gdb surveillance, probably due to unexpected
>>>>>>> mode switches.
>>>>>>
>>>>>> Clearly, yes. GDB causes the test program to leave primary mode, which
>>>>>> changes the scheduling order, and therefore the output which depends on it.
>>>>>>
>>>>>>  But the real problem is that running the test again
>>>>>>> afterwards, with or without gdb, causes the oops. Registers contain
>>>>>>> suspicious "dead" patterns, thus we access invalid list elements. Do we
>>>>>>> miss a cleanup when terminating smokey in the gdb session?
>>>>>>>
>>>>>>
>>>>>> I could not reproduce this bug yet.
>>>>>>
>>>>>> There is no reason for ptracing the application to have any impact on
>>>>>> the housekeeping chores when it exits. The backtrace shows that
>>>>>> xnsched_tp_set_schedule() is walking through tp->threads, which seems to
>>>>>> link to a stale tcb. xnsched_tp_forget() would then be called twice,
>>>>>> leading to the fault.
>>>>>>
>>>>>> Normally, a thread that undergoes TP scheduling should be automatically
>>>>>> removed from tp->threads upon exit after this sequence took place:
>>>>>>
>>>>>> handle_taskexit_event -> __xnthread_cleanup -> cleanup_tcb ->
>>>>>> xnsched_forget -> xnsched_tp_forget
>>>>>>
>>>>>> For that bug to happen, either this assumption has to be wrong, or
>>>>>> xnsched_set_policy() is being silly at some point.
>>>>>>
>>>>>> Is this 100% reproducible on your end, and does this require the initial
>>>>>> gdb run to show up, or would that break even when running the sched_tp
>>>>>> twice without gdb?
>>>>>
>>>>> It is always reproducible, also with current next branch. And you need
>>>>> to run gdb beforehand, yes.
>>>>>
>>>>> I'll see if I can look into details.
>>>>
>>>> During cleanup of the first run under gdb, I get this one as expected
>>>> (and two more hits for thread and C):
>>>>
>>>> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>>> 175     {
>>>> (gdb) p thread->name
>>>> $3 = "threadA", '\000' <repeats 24 times>
>>>> (gdb) bt
>>>> #0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>>> #1  0xffffffff8114b19f in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
>>>> #2  cleanup_tcb (thread=<optimized out>) at ../kernel/xenomai/thread.c:467
>>>> #3  __xnthread_cleanup (curr=0xffff88003ad07040) at ../kernel/xenomai/thread.c:486
>>>> #4  0xffffffff811794fd in handle_taskexit_event (p=<optimized out>) at ../kernel/xenomai/posix/process.c:1028
>>>> #5  0xffffffff8117b49d in ipipe_kevent_hook (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/xenomai/posix/process.c:1228
>>>> #6  0xffffffff810fc6d1 in __ipipe_notify_kevent (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/ipipe/core.c:1092
>>>> #7  0xffffffff81050702 in do_exit (code=0) at ../kernel/exit.c:717
>>>> #8  0xffffffff810518a7 in SYSC_exit (error_code=<optimized out>) at ../kernel/exit.c:855
>>>> #9  SyS_exit (error_code=<optimized out>) at ../kernel/exit.c:853
>>>> #10 <signal handler called>
>>>> #11 0x00007ffff7354146 in ?? ()
>>>> #12 0xffff88003cfcde10 in ?? ()
>>>> #13 0xffffffff81a09260 in ?? ()
>>>> #14 0x0000000000000000 in ?? ()
>>>> (gdb) c
>>>> Continuing.
>>>>
>>>>
>>>> But then, when I start the test again (with or without gdb), I also get
>>>> this right at the beginning:
>>>>
>>>>
>>>> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>>> 175     {
>>>> (gdb) bt
>>>> #0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>>> #1  0xffffffff8113ebae in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
>>>> #2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffffffff81a2bbe0 <xnsched_class_rt>, p=0xffff88003b813e00) at ../kernel/xenomai/sched.c:403
>>>> #3  0xffffffff8115184f in xnsched_tp_set_schedule (sched=0xffff88003ad07040, gps=0xffff88003ad08080) at ../kernel/xenomai/sched-tp.c:260
>>>> #4  0xffffffff8117c5df in set_tp_config (len=<optimized out>, config=<optimized out>, cpu=<optimized out>) at ../kernel/xenomai/posix/sched.c:284
>>>
>>> Yes, this one is the weird one. Normally, we should not find any TCB
>>> lingering in tp->threads, once threads A, B and C have exited and
>>> unlinked from it via xnsched_forget().
>>>
>>> That call on behalf of xnsched_tp_set_schedule() is aimed at moving all
>>> threads currently undergoing a TP schedule to the RT class, since we are
>>> about to change the scheduling data (i.e. time windows and partitions).
>>> Why tp->threads is not empty when running xnsched_tp_set_schedule() at
>>> the next program invocation is what needs to be explained.
>>
>> Under gdb supervision, some threads are added to the tp pool multiple
>> times:
>>
>> Breakpoint 1, xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
>> 160     {
>> (gdb) bt
>> #0  xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
>> #1  0xffffffff8113eb45 in xnsched_declare (p=<optimized out>, thread=<optimized out>, sched_class=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:540
>> #2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffff88003a7b7e20, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched.c:390
>> #3  0xffffffff8114b7cf in __xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=<optimized out>, sched_param=<optimized out>) at ../kernel/xenomai/thread.c:1802
>> #4  0xffffffff8114b8cc in xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=0xffffffff81a2e400 <xnsched_class_tp>, sched_param=0xffff88003a7b7e20) at ../kernel/xenomai/thread.c:1772
>> #5  0xffffffff8118c2c3 in pthread_setschedparam_ex (param_ex=<optimized out>, policy=<optimized out>, thread=<optimized out>) at ../kernel/xenomai/posix/thread.c:271
>> #6  __cobalt_thread_setschedparam_ex (pth=<optimized out>, policy=<optimized out>, param_ex=<optimized out>, u_winoff=<optimized out>, u_promoted=<optimized out>) at ../kernel/xenomai/posix/thread.c:481
>> #7  0xffffffff8118c614 in cobalt_thread_setschedparam_ex (pth=140737352652544, policy=11, u_param=<optimized out>, u_winoff=0x7ffff7e94bec, u_promoted=0x7ffff7e94be8) at ../kernel/xenomai/posix/thread.c:509
>> #8  0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>, regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
>> #9  ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003a7b7f58) at ../kernel/xenomai/posix/syscall.c:1107
>> #10 0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at ../kernel/ipipe/core.c:1006
>> #11 <signal handler called>
>> #12 0x00007ffff77b7d25 in ?? ()
>> #13 0x5a5a5a5a00000001 in ?? ()
>> #14 0x0000000000000000 in ?? ()
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 1, xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
>> 160     {
>> (gdb) bt
>> #0  xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
>> #1  0xffffffff8113eb45 in xnsched_declare (p=<optimized out>, thread=<optimized out>, sched_class=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:540
>> #2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffff88003a7b7e20, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched.c:390
>> #3  0xffffffff8114b7cf in __xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=<optimized out>, sched_param=<optimized out>) at ../kernel/xenomai/thread.c:1802
>> #4  0xffffffff8114b8cc in xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=0xffffffff81a2e400 <xnsched_class_tp>, sched_param=0xffff88003a7b7e20) at ../kernel/xenomai/thread.c:1772
>> #5  0xffffffff8118c2c3 in pthread_setschedparam_ex (param_ex=<optimized out>, policy=<optimized out>, thread=<optimized out>) at ../kernel/xenomai/posix/thread.c:271
>> #6  __cobalt_thread_setschedparam_ex (pth=<optimized out>, policy=<optimized out>, param_ex=<optimized out>, u_winoff=<optimized out>, u_promoted=<optimized out>) at ../kernel/xenomai/posix/thread.c:481
>> #7  0xffffffff8118c614 in cobalt_thread_setschedparam_ex (pth=140737352652544, policy=11, u_param=<optimized out>, u_winoff=0x7ffff7e94bec, u_promoted=0x7ffff7e94be8) at ../kernel/xenomai/posix/thread.c:509
>> #8  0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>, regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
>> #9  ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003a7b7f58) at ../kernel/xenomai/posix/syscall.c:1107
>> #10 0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at ../kernel/ipipe/core.c:1006
>> #11 <signal handler called>
>> #12 0x00007ffff77b7d25 in ?? ()
>> #13 0x5a5a5a5a00000001 in ?? ()
>> #14 0x0000000000000000 in ?? ()
>>
>>
>> Could it be that some syscall is restarted after being interrupted by
>> ptrace?
> 
> Don't understand the reason for this multiple calls in details yet, but
> you can also trigger the bug by invoking pthread_setschedparam_ex twice
> in smokey's sched-tp.c. Fix pushed to for-forge.
> 

This is what has to be fixed, because guarding would only paper over the
issue for TP, but the latter would bite the same way with other existing
or future policies.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb
  2015-03-16 19:24             ` Philippe Gerum
@ 2015-03-16 19:35               ` Jan Kiszka
  2015-03-16 19:41                 ` Philippe Gerum
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2015-03-16 19:35 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai

On 2015-03-16 20:24, Philippe Gerum wrote:
> On 03/16/2015 06:16 PM, Jan Kiszka wrote:
>> On 2015-03-16 17:42, Jan Kiszka wrote:
>>> On 2015-03-16 17:09, Philippe Gerum wrote:
>>>> On 03/16/2015 05:00 PM, Jan Kiszka wrote:
>>>>> On 2015-03-16 16:31, Jan Kiszka wrote:
>>>>>> On 2015-03-16 15:43, Philippe Gerum wrote:
>>>>>>> On 03/11/2015 03:47 PM, Jan Kiszka wrote:
>>>>>>>> Hi Philippe,
>>>>>>>>
>>>>>>>> just happened to trigger the oops below by running
>>>>>>>>
>>>>>>>> gdb --args smokey --run=8
>>>>>>>>
>>>>>>>> That run already has troubles and generates different output than
>>>>>>>> running the test without gdb surveillance, probably due to unexpected
>>>>>>>> mode switches.
>>>>>>>
>>>>>>> Clearly, yes. GDB causes the test program to leave primary mode, which
>>>>>>> changes the scheduling order, and therefore the output which depends on it.
>>>>>>>
>>>>>>>  But the real problem is that running the test again
>>>>>>>> afterwards, with or without gdb, causes the oops. Registers contain
>>>>>>>> suspicious "dead" patterns, thus we access invalid list elements. Do we
>>>>>>>> miss a cleanup when terminating smokey in the gdb session?
>>>>>>>>
>>>>>>>
>>>>>>> I could not reproduce this bug yet.
>>>>>>>
>>>>>>> There is no reason for ptracing the application to have any impact on
>>>>>>> the housekeeping chores when it exits. The backtrace shows that
>>>>>>> xnsched_tp_set_schedule() is walking through tp->threads, which seems to
>>>>>>> link to a stale tcb. xnsched_tp_forget() would then be called twice,
>>>>>>> leading to the fault.
>>>>>>>
>>>>>>> Normally, a thread that undergoes TP scheduling should be automatically
>>>>>>> removed from tp->threads upon exit after this sequence took place:
>>>>>>>
>>>>>>> handle_taskexit_event -> __xnthread_cleanup -> cleanup_tcb ->
>>>>>>> xnsched_forget -> xnsched_tp_forget
>>>>>>>
>>>>>>> For that bug to happen, either this assumption has to be wrong, or
>>>>>>> xnsched_set_policy() is being silly at some point.
>>>>>>>
>>>>>>> Is this 100% reproducible on your end, and does this require the initial
>>>>>>> gdb run to show up, or would that break even when running the sched_tp
>>>>>>> twice without gdb?
>>>>>>
>>>>>> It is always reproducible, also with current next branch. And you need
>>>>>> to run gdb beforehand, yes.
>>>>>>
>>>>>> I'll see if I can look into details.
>>>>>
>>>>> During cleanup of the first run under gdb, I get this one as expected
>>>>> (and two more hits for thread and C):
>>>>>
>>>>> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>>>> 175     {
>>>>> (gdb) p thread->name
>>>>> $3 = "threadA", '\000' <repeats 24 times>
>>>>> (gdb) bt
>>>>> #0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>>>> #1  0xffffffff8114b19f in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
>>>>> #2  cleanup_tcb (thread=<optimized out>) at ../kernel/xenomai/thread.c:467
>>>>> #3  __xnthread_cleanup (curr=0xffff88003ad07040) at ../kernel/xenomai/thread.c:486
>>>>> #4  0xffffffff811794fd in handle_taskexit_event (p=<optimized out>) at ../kernel/xenomai/posix/process.c:1028
>>>>> #5  0xffffffff8117b49d in ipipe_kevent_hook (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/xenomai/posix/process.c:1228
>>>>> #6  0xffffffff810fc6d1 in __ipipe_notify_kevent (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/ipipe/core.c:1092
>>>>> #7  0xffffffff81050702 in do_exit (code=0) at ../kernel/exit.c:717
>>>>> #8  0xffffffff810518a7 in SYSC_exit (error_code=<optimized out>) at ../kernel/exit.c:855
>>>>> #9  SyS_exit (error_code=<optimized out>) at ../kernel/exit.c:853
>>>>> #10 <signal handler called>
>>>>> #11 0x00007ffff7354146 in ?? ()
>>>>> #12 0xffff88003cfcde10 in ?? ()
>>>>> #13 0xffffffff81a09260 in ?? ()
>>>>> #14 0x0000000000000000 in ?? ()
>>>>> (gdb) c
>>>>> Continuing.
>>>>>
>>>>>
>>>>> But then, when I start the test again (with or without gdb), I also get
>>>>> this right at the beginning:
>>>>>
>>>>>
>>>>> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>>>> 175     {
>>>>> (gdb) bt
>>>>> #0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>>>> #1  0xffffffff8113ebae in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
>>>>> #2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffffffff81a2bbe0 <xnsched_class_rt>, p=0xffff88003b813e00) at ../kernel/xenomai/sched.c:403
>>>>> #3  0xffffffff8115184f in xnsched_tp_set_schedule (sched=0xffff88003ad07040, gps=0xffff88003ad08080) at ../kernel/xenomai/sched-tp.c:260
>>>>> #4  0xffffffff8117c5df in set_tp_config (len=<optimized out>, config=<optimized out>, cpu=<optimized out>) at ../kernel/xenomai/posix/sched.c:284
>>>>
>>>> Yes, this one is the weird one. Normally, we should not find any TCB
>>>> lingering in tp->threads, once threads A, B and C have exited and
>>>> unlinked from it via xnsched_forget().
>>>>
>>>> That call on behalf of xnsched_tp_set_schedule() is aimed at moving all
>>>> threads currently undergoing a TP schedule to the RT class, since we are
>>>> about to change the scheduling data (i.e. time windows and partitions).
>>>> Why tp->threads is not empty when running xnsched_tp_set_schedule() at
>>>> the next program invocation is what needs to be explained.
>>>
>>> Under gdb supervision, some threads are added to the tp pool multiple
>>> times:
>>>
>>> Breakpoint 1, xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
>>> 160     {
>>> (gdb) bt
>>> #0  xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
>>> #1  0xffffffff8113eb45 in xnsched_declare (p=<optimized out>, thread=<optimized out>, sched_class=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:540
>>> #2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffff88003a7b7e20, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched.c:390
>>> #3  0xffffffff8114b7cf in __xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=<optimized out>, sched_param=<optimized out>) at ../kernel/xenomai/thread.c:1802
>>> #4  0xffffffff8114b8cc in xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=0xffffffff81a2e400 <xnsched_class_tp>, sched_param=0xffff88003a7b7e20) at ../kernel/xenomai/thread.c:1772
>>> #5  0xffffffff8118c2c3 in pthread_setschedparam_ex (param_ex=<optimized out>, policy=<optimized out>, thread=<optimized out>) at ../kernel/xenomai/posix/thread.c:271
>>> #6  __cobalt_thread_setschedparam_ex (pth=<optimized out>, policy=<optimized out>, param_ex=<optimized out>, u_winoff=<optimized out>, u_promoted=<optimized out>) at ../kernel/xenomai/posix/thread.c:481
>>> #7  0xffffffff8118c614 in cobalt_thread_setschedparam_ex (pth=140737352652544, policy=11, u_param=<optimized out>, u_winoff=0x7ffff7e94bec, u_promoted=0x7ffff7e94be8) at ../kernel/xenomai/posix/thread.c:509
>>> #8  0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>, regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
>>> #9  ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003a7b7f58) at ../kernel/xenomai/posix/syscall.c:1107
>>> #10 0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at ../kernel/ipipe/core.c:1006
>>> #11 <signal handler called>
>>> #12 0x00007ffff77b7d25 in ?? ()
>>> #13 0x5a5a5a5a00000001 in ?? ()
>>> #14 0x0000000000000000 in ?? ()
>>> (gdb) c
>>> Continuing.
>>>
>>> Breakpoint 1, xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
>>> 160     {
>>> (gdb) bt
>>> #0  xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
>>> #1  0xffffffff8113eb45 in xnsched_declare (p=<optimized out>, thread=<optimized out>, sched_class=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:540
>>> #2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffff88003a7b7e20, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched.c:390
>>> #3  0xffffffff8114b7cf in __xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=<optimized out>, sched_param=<optimized out>) at ../kernel/xenomai/thread.c:1802
>>> #4  0xffffffff8114b8cc in xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=0xffffffff81a2e400 <xnsched_class_tp>, sched_param=0xffff88003a7b7e20) at ../kernel/xenomai/thread.c:1772
>>> #5  0xffffffff8118c2c3 in pthread_setschedparam_ex (param_ex=<optimized out>, policy=<optimized out>, thread=<optimized out>) at ../kernel/xenomai/posix/thread.c:271
>>> #6  __cobalt_thread_setschedparam_ex (pth=<optimized out>, policy=<optimized out>, param_ex=<optimized out>, u_winoff=<optimized out>, u_promoted=<optimized out>) at ../kernel/xenomai/posix/thread.c:481
>>> #7  0xffffffff8118c614 in cobalt_thread_setschedparam_ex (pth=140737352652544, policy=11, u_param=<optimized out>, u_winoff=0x7ffff7e94bec, u_promoted=0x7ffff7e94be8) at ../kernel/xenomai/posix/thread.c:509
>>> #8  0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>, regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
>>> #9  ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003a7b7f58) at ../kernel/xenomai/posix/syscall.c:1107
>>> #10 0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at ../kernel/ipipe/core.c:1006
>>> #11 <signal handler called>
>>> #12 0x00007ffff77b7d25 in ?? ()
>>> #13 0x5a5a5a5a00000001 in ?? ()
>>> #14 0x0000000000000000 in ?? ()
>>>
>>>
>>> Could it be that some syscall is restarted after being interrupted by
>>> ptrace?
>>
>> Don't understand the reason for this multiple calls in details yet, but
>> you can also trigger the bug by invoking pthread_setschedparam_ex twice
>> in smokey's sched-tp.c. Fix pushed to for-forge.
>>
> 
> This is what has to be fixed, because guarding would only paper over the
> issue for TP, but the latter would bite the same way with other existing
> or future policies.

You mean push the guard into xnsched_set_policy, e.g.?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb
  2015-03-16 19:35               ` Jan Kiszka
@ 2015-03-16 19:41                 ` Philippe Gerum
  2015-03-16 19:44                   ` Jan Kiszka
  0 siblings, 1 reply; 14+ messages in thread
From: Philippe Gerum @ 2015-03-16 19:41 UTC (permalink / raw)
  To: Jan Kiszka, Xenomai

On 03/16/2015 08:35 PM, Jan Kiszka wrote:
> On 2015-03-16 20:24, Philippe Gerum wrote:
>> On 03/16/2015 06:16 PM, Jan Kiszka wrote:
>>> On 2015-03-16 17:42, Jan Kiszka wrote:
>>>> On 2015-03-16 17:09, Philippe Gerum wrote:
>>>>> On 03/16/2015 05:00 PM, Jan Kiszka wrote:
>>>>>> On 2015-03-16 16:31, Jan Kiszka wrote:
>>>>>>> On 2015-03-16 15:43, Philippe Gerum wrote:
>>>>>>>> On 03/11/2015 03:47 PM, Jan Kiszka wrote:
>>>>>>>>> Hi Philippe,
>>>>>>>>>
>>>>>>>>> just happened to trigger the oops below by running
>>>>>>>>>
>>>>>>>>> gdb --args smokey --run=8
>>>>>>>>>
>>>>>>>>> That run already has troubles and generates different output than
>>>>>>>>> running the test without gdb surveillance, probably due to unexpected
>>>>>>>>> mode switches.
>>>>>>>>
>>>>>>>> Clearly, yes. GDB causes the test program to leave primary mode, which
>>>>>>>> changes the scheduling order, and therefore the output which depends on it.
>>>>>>>>
>>>>>>>>  But the real problem is that running the test again
>>>>>>>>> afterwards, with or without gdb, causes the oops. Registers contain
>>>>>>>>> suspicious "dead" patterns, thus we access invalid list elements. Do we
>>>>>>>>> miss a cleanup when terminating smokey in the gdb session?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I could not reproduce this bug yet.
>>>>>>>>
>>>>>>>> There is no reason for ptracing the application to have any impact on
>>>>>>>> the housekeeping chores when it exits. The backtrace shows that
>>>>>>>> xnsched_tp_set_schedule() is walking through tp->threads, which seems to
>>>>>>>> link to a stale tcb. xnsched_tp_forget() would then be called twice,
>>>>>>>> leading to the fault.
>>>>>>>>
>>>>>>>> Normally, a thread that undergoes TP scheduling should be automatically
>>>>>>>> removed from tp->threads upon exit after this sequence took place:
>>>>>>>>
>>>>>>>> handle_taskexit_event -> __xnthread_cleanup -> cleanup_tcb ->
>>>>>>>> xnsched_forget -> xnsched_tp_forget
>>>>>>>>
>>>>>>>> For that bug to happen, either this assumption has to be wrong, or
>>>>>>>> xnsched_set_policy() is being silly at some point.
>>>>>>>>
>>>>>>>> Is this 100% reproducible on your end, and does this require the initial
>>>>>>>> gdb run to show up, or would that break even when running the sched_tp
>>>>>>>> twice without gdb?
>>>>>>>
>>>>>>> It is always reproducible, also with current next branch. And you need
>>>>>>> to run gdb beforehand, yes.
>>>>>>>
>>>>>>> I'll see if I can look into details.
>>>>>>
>>>>>> During cleanup of the first run under gdb, I get this one as expected
>>>>>> (and two more hits for thread and C):
>>>>>>
>>>>>> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>>>>> 175     {
>>>>>> (gdb) p thread->name
>>>>>> $3 = "threadA", '\000' <repeats 24 times>
>>>>>> (gdb) bt
>>>>>> #0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>>>>> #1  0xffffffff8114b19f in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
>>>>>> #2  cleanup_tcb (thread=<optimized out>) at ../kernel/xenomai/thread.c:467
>>>>>> #3  __xnthread_cleanup (curr=0xffff88003ad07040) at ../kernel/xenomai/thread.c:486
>>>>>> #4  0xffffffff811794fd in handle_taskexit_event (p=<optimized out>) at ../kernel/xenomai/posix/process.c:1028
>>>>>> #5  0xffffffff8117b49d in ipipe_kevent_hook (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/xenomai/posix/process.c:1228
>>>>>> #6  0xffffffff810fc6d1 in __ipipe_notify_kevent (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/ipipe/core.c:1092
>>>>>> #7  0xffffffff81050702 in do_exit (code=0) at ../kernel/exit.c:717
>>>>>> #8  0xffffffff810518a7 in SYSC_exit (error_code=<optimized out>) at ../kernel/exit.c:855
>>>>>> #9  SyS_exit (error_code=<optimized out>) at ../kernel/exit.c:853
>>>>>> #10 <signal handler called>
>>>>>> #11 0x00007ffff7354146 in ?? ()
>>>>>> #12 0xffff88003cfcde10 in ?? ()
>>>>>> #13 0xffffffff81a09260 in ?? ()
>>>>>> #14 0x0000000000000000 in ?? ()
>>>>>> (gdb) c
>>>>>> Continuing.
>>>>>>
>>>>>>
>>>>>> But then, when I start the test again (with or without gdb), I also get
>>>>>> this right at the beginning:
>>>>>>
>>>>>>
>>>>>> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>>>>> 175     {
>>>>>> (gdb) bt
>>>>>> #0  xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>>>>>> #1  0xffffffff8113ebae in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
>>>>>> #2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffffffff81a2bbe0 <xnsched_class_rt>, p=0xffff88003b813e00) at ../kernel/xenomai/sched.c:403
>>>>>> #3  0xffffffff8115184f in xnsched_tp_set_schedule (sched=0xffff88003ad07040, gps=0xffff88003ad08080) at ../kernel/xenomai/sched-tp.c:260
>>>>>> #4  0xffffffff8117c5df in set_tp_config (len=<optimized out>, config=<optimized out>, cpu=<optimized out>) at ../kernel/xenomai/posix/sched.c:284
>>>>>
>>>>> Yes, this one is the weird one. Normally, we should not find any TCB
>>>>> lingering in tp->threads, once threads A, B and C have exited and
>>>>> unlinked from it via xnsched_forget().
>>>>>
>>>>> That call on behalf of xnsched_tp_set_schedule() is aimed at moving all
>>>>> threads currently undergoing a TP schedule to the RT class, since we are
>>>>> about to change the scheduling data (i.e. time windows and partitions).
>>>>> Why tp->threads is not empty when running xnsched_tp_set_schedule() at
>>>>> the next program invocation is what needs to be explained.
>>>>
>>>> Under gdb supervision, some threads are added to the tp pool multiple
>>>> times:
>>>>
>>>> Breakpoint 1, xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
>>>> 160     {
>>>> (gdb) bt
>>>> #0  xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
>>>> #1  0xffffffff8113eb45 in xnsched_declare (p=<optimized out>, thread=<optimized out>, sched_class=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:540
>>>> #2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffff88003a7b7e20, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched.c:390
>>>> #3  0xffffffff8114b7cf in __xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=<optimized out>, sched_param=<optimized out>) at ../kernel/xenomai/thread.c:1802
>>>> #4  0xffffffff8114b8cc in xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=0xffffffff81a2e400 <xnsched_class_tp>, sched_param=0xffff88003a7b7e20) at ../kernel/xenomai/thread.c:1772
>>>> #5  0xffffffff8118c2c3 in pthread_setschedparam_ex (param_ex=<optimized out>, policy=<optimized out>, thread=<optimized out>) at ../kernel/xenomai/posix/thread.c:271
>>>> #6  __cobalt_thread_setschedparam_ex (pth=<optimized out>, policy=<optimized out>, param_ex=<optimized out>, u_winoff=<optimized out>, u_promoted=<optimized out>) at ../kernel/xenomai/posix/thread.c:481
>>>> #7  0xffffffff8118c614 in cobalt_thread_setschedparam_ex (pth=140737352652544, policy=11, u_param=<optimized out>, u_winoff=0x7ffff7e94bec, u_promoted=0x7ffff7e94be8) at ../kernel/xenomai/posix/thread.c:509
>>>> #8  0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>, regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
>>>> #9  ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003a7b7f58) at ../kernel/xenomai/posix/syscall.c:1107
>>>> #10 0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at ../kernel/ipipe/core.c:1006
>>>> #11 <signal handler called>
>>>> #12 0x00007ffff77b7d25 in ?? ()
>>>> #13 0x5a5a5a5a00000001 in ?? ()
>>>> #14 0x0000000000000000 in ?? ()
>>>> (gdb) c
>>>> Continuing.
>>>>
>>>> Breakpoint 1, xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
>>>> 160     {
>>>> (gdb) bt
>>>> #0  xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
>>>> #1  0xffffffff8113eb45 in xnsched_declare (p=<optimized out>, thread=<optimized out>, sched_class=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:540
>>>> #2  xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffff88003a7b7e20, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched.c:390
>>>> #3  0xffffffff8114b7cf in __xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=<optimized out>, sched_param=<optimized out>) at ../kernel/xenomai/thread.c:1802
>>>> #4  0xffffffff8114b8cc in xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=0xffffffff81a2e400 <xnsched_class_tp>, sched_param=0xffff88003a7b7e20) at ../kernel/xenomai/thread.c:1772
>>>> #5  0xffffffff8118c2c3 in pthread_setschedparam_ex (param_ex=<optimized out>, policy=<optimized out>, thread=<optimized out>) at ../kernel/xenomai/posix/thread.c:271
>>>> #6  __cobalt_thread_setschedparam_ex (pth=<optimized out>, policy=<optimized out>, param_ex=<optimized out>, u_winoff=<optimized out>, u_promoted=<optimized out>) at ../kernel/xenomai/posix/thread.c:481
>>>> #7  0xffffffff8118c614 in cobalt_thread_setschedparam_ex (pth=140737352652544, policy=11, u_param=<optimized out>, u_winoff=0x7ffff7e94bec, u_promoted=0x7ffff7e94be8) at ../kernel/xenomai/posix/thread.c:509
>>>> #8  0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>, regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
>>>> #9  ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003a7b7f58) at ../kernel/xenomai/posix/syscall.c:1107
>>>> #10 0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at ../kernel/ipipe/core.c:1006
>>>> #11 <signal handler called>
>>>> #12 0x00007ffff77b7d25 in ?? ()
>>>> #13 0x5a5a5a5a00000001 in ?? ()
>>>> #14 0x0000000000000000 in ?? ()
>>>>
>>>>
>>>> Could it be that some syscall is restarted after being interrupted by
>>>> ptrace?
>>>
>>> Don't understand the reason for this multiple calls in details yet, but
>>> you can also trigger the bug by invoking pthread_setschedparam_ex twice
>>> in smokey's sched-tp.c. Fix pushed to for-forge.
>>>
>>
>> This is what has to be fixed, because guarding would only paper over the
>> issue for TP, but the latter would bite the same way with other existing
>> or future policies.
> 
> You mean push the guard into xnsched_set_policy, e.g.?
> 

Or fix what might be a signal restart issue so that we don't end up
twice in xnsched_set_policy().

-- 
Philippe.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb
  2015-03-16 19:41                 ` Philippe Gerum
@ 2015-03-16 19:44                   ` Jan Kiszka
  2015-03-16 20:00                     ` Philippe Gerum
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2015-03-16 19:44 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai

On 2015-03-16 20:41, Philippe Gerum wrote:
> On 03/16/2015 08:35 PM, Jan Kiszka wrote:
>> On 2015-03-16 20:24, Philippe Gerum wrote:
>>> On 03/16/2015 06:16 PM, Jan Kiszka wrote:
>>>> Don't understand the reason for this multiple calls in details yet, but
>>>> you can also trigger the bug by invoking pthread_setschedparam_ex twice
>>>> in smokey's sched-tp.c. Fix pushed to for-forge.
>>>>
>>>
>>> This is what has to be fixed, because guarding would only paper over the
>>> issue for TP, but the latter would bite the same way with other existing
>>> or future policies.
>>
>> You mean push the guard into xnsched_set_policy, e.g.?
>>
> 
> Or fix what might be a signal restart issue so that we don't end up
> twice in xnsched_set_policy().

Again, already calling pthread_setschedparam_ex twice in a row on the
same thread triggers the bug. There is no way around such a guard.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb
  2015-03-16 19:44                   ` Jan Kiszka
@ 2015-03-16 20:00                     ` Philippe Gerum
  0 siblings, 0 replies; 14+ messages in thread
From: Philippe Gerum @ 2015-03-16 20:00 UTC (permalink / raw)
  To: Jan Kiszka, Xenomai

On 03/16/2015 08:44 PM, Jan Kiszka wrote:
> On 2015-03-16 20:41, Philippe Gerum wrote:
>> On 03/16/2015 08:35 PM, Jan Kiszka wrote:
>>> On 2015-03-16 20:24, Philippe Gerum wrote:
>>>> On 03/16/2015 06:16 PM, Jan Kiszka wrote:
>>>>> Don't understand the reason for this multiple calls in details yet, but
>>>>> you can also trigger the bug by invoking pthread_setschedparam_ex twice
>>>>> in smokey's sched-tp.c. Fix pushed to for-forge.
>>>>>
>>>>
>>>> This is what has to be fixed, because guarding would only paper over the
>>>> issue for TP, but the latter would bite the same way with other existing
>>>> or future policies.
>>>
>>> You mean push the guard into xnsched_set_policy, e.g.?
>>>
>>
>> Or fix what might be a signal restart issue so that we don't end up
>> twice in xnsched_set_policy().
> 
> Again, already calling pthread_setschedparam_ex twice in a row on the
> same thread triggers the bug. There is no way around such a guard.
> 

Then yes, the proper fix should be pushed to xnsched_set_policy() instead:

diff --git a/kernel/cobalt/sched.c b/kernel/cobalt/sched.c
index a6469db..093e3be 100644
--- a/kernel/cobalt/sched.c
+++ b/kernel/cobalt/sched.c
@@ -387,9 +387,11 @@ int xnsched_set_policy(struct xnthread *thread,
 	 * affect the previous class (such as touching thread->rlink
 	 * for instance).
 	 */
-	ret = xnsched_declare(sched_class, thread, p);
-	if (ret)
-		return ret;
+	if (sched_class != thread->base_class) {
+		ret = xnsched_declare(sched_class, thread, p);
+		if (ret)
+			return ret;
+	}

 	/*
 	 * As a special case, we may be called from __xnthread_init()

-- 
Philippe.


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-03-16 20:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-11 14:47 [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb Jan Kiszka
2015-03-11 15:12 ` Philippe Gerum
2015-03-16 14:43 ` Philippe Gerum
2015-03-16 15:31   ` Jan Kiszka
2015-03-16 16:00     ` Jan Kiszka
2015-03-16 16:02       ` Jan Kiszka
2015-03-16 16:09       ` Philippe Gerum
2015-03-16 16:42         ` Jan Kiszka
2015-03-16 17:16           ` Jan Kiszka
2015-03-16 19:24             ` Philippe Gerum
2015-03-16 19:35               ` Jan Kiszka
2015-03-16 19:41                 ` Philippe Gerum
2015-03-16 19:44                   ` Jan Kiszka
2015-03-16 20:00                     ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.