All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
@ 2006-01-21 10:47 Jan Kiszka
  2006-01-21 10:51 ` [Xenomai-core] " Jeroen Van den Keybus
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Jan Kiszka @ 2006-01-21 10:47 UTC (permalink / raw)
  To: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 3922 bytes --]

Hi,

well, if I'm not totally wrong, we have a design problem in the
RT-thread hardening path. I dug into the crash Jeroen reported and I'm
quite sure that this is the reason.

So that's the bad news. The good one is that we can at least work around
it by switching off CONFIG_PREEMPT for Linux (this implicitly means that
it's a 2.6-only issue).

@Jeroen: Did you verify that your setup also works fine without
CONFIG_PREEMPT?

But let's start with two assumptions my further analysis is based on:

[Xenomai]
 o Shadow threads have only one stack, i.e. one context. If the
   real-time part is active (this includes it is blocked on some xnsynch
   object or delayed), the original Linux task must NEVER EVER be
   executed, even if it will immediately fall asleep again. That's
   because the stack is in use by the real-time part at that time. And
   this condition is checked in do_schedule_event() [1].

[Linux]
 o A Linux task which has called set_current_state(<blocking_bit>) will
   remain in the run-queue as long as it calls schedule() on its own.
   This means that it can be preempted (if CONFIG_PREEMPT is set)
   between set_current_state() and schedule() and then even be resumed
   again. Only the explicit call of schedule() will trigger
   deactivate_task() which will in turn remove current from the
   run-queue.

Ok, if this is true, let's have a look at xnshadow_harden(): After
grabbing the gatekeeper sem and putting itself in gk->thread, a task
going for RT then marks itself TASK_INTERRUPTIBLE and wakes up the
gatekeeper [2]. This does not include a Linux reschedule due to the
_sync version of wake_up_interruptible. What can happen now?

1) No interruption until we can called schedule() [3]. All fine as we
will not be removed from the run-queue before the gatekeeper starts
kicking our RT part, thus no conflict in using the thread's stack.

3) Interruption by a RT IRQ. This would just delay the path described
above, even if some RT threads get executed. Once they are finished, we
continue in xnshadow_harden() - given that the RT part does not trigger
the following case:

3) Interruption by some Linux IRQ. This may cause other threads to
become runnable as well, but the gatekeeper has the highest prio and
will therefore be the next. The problem is that the rescheduling on
Linux IRQ exit will PREEMPT our task in xnshadow_harden(), it will NOT
remove it from the Linux run-queue. And now we are in real troubles: The
gatekeeper will kick off our RT part which will take over the thread's
stack. As soon as the RT domain falls asleep and Linux takes over again,
it will continue our non-RT part as well! Actually, this seems to be the
reason for the panic in do_schedule_event(). Without
CONFIG_XENO_OPT_DEBUG and this check, we will run both parts AT THE SAME
TIME now, thus violating my first assumption. The system gets fatally
corrupted.

Well, I would be happy if someone can prove me wrong here.

The problem is that I don't see a solution because Linux does not
provide an atomic wake-up + schedule-out under CONFIG_PREEMPT. I'm
currently considering a hack to remove the migrating Linux thread
manually from the run-queue, but this could easily break the Linux
scheduler.

Jan


PS: Out of curiosity I also checked RTAI's migration mechanism in this
regard. It's similar except for the fact that it does the gatekeeper's
work in the Linux scheduler's tail (i.e. after the next context switch).
And RTAI seems it suffers from the very same race. So this is either a
fundamental issue - or I'm fundamentally wrong.


[1]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L1573
[2]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L461
[3]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L481


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Xenomai-core] Re: [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-21 10:47 [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT Jan Kiszka
@ 2006-01-21 10:51 ` Jeroen Van den Keybus
  2006-01-21 16:47 ` [Xenomai-core] " Hannes Mayer
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Jeroen Van den Keybus @ 2006-01-21 10:51 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 147 bytes --]

>
> @Jeroen: Did you verify that your setup also works fine without
> CONFIG_PREEMPT?


Verified. Your workaround works. No more dmesg logs.

[-- Attachment #2: Type: text/html, Size: 327 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-21 10:47 [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT Jan Kiszka
  2006-01-21 10:51 ` [Xenomai-core] " Jeroen Van den Keybus
@ 2006-01-21 16:47 ` Hannes Mayer
  2006-01-21 17:01   ` Jan Kiszka
  2006-01-22  8:10 ` Dmitry Adamushko
  2006-01-29 23:48 ` Philippe Gerum
  3 siblings, 1 reply; 27+ messages in thread
From: Hannes Mayer @ 2006-01-21 16:47 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 805 bytes --]

Jan Kiszka wrote:
[...]
> PS: Out of curiosity I also checked RTAI's migration mechanism in this
> regard. It's similar except for the fact that it does the gatekeeper's
> work in the Linux scheduler's tail (i.e. after the next context switch).
> And RTAI seems it suffers from the very same race. So this is either a
> fundamental issue - or I'm fundamentally wrong.


Well, most of the stuff you guys talk about in this thread is still
beyond my level, but out of curiosity I ported the SEM example to
RTAI (see attached sem.c)
I couldn't come up with something similar to rt_sem_inquire and
rt_task_inquire in RTAI (in "void output(char c)")...
Anyway, unless I haven't missed something else important while
porting, the example runs flawlessly on RTAI 3.3test3 (kernel 2.6.15).

Best regards,
Hannes.

[-- Attachment #2: sem.c --]
[-- Type: text/x-csrc, Size: 3781 bytes --]

/* TEST_SEM.C ported to RTAI3.3*/

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <math.h>
#include <values.h>

#include <sys/mman.h>

#include <rtai.h>
#include <rtai_sched.h>
#include <rtai_sem.h>

int fd, err;
int t0end = 1;
int t1end = 1;

SEM *s, *m;
float tmax = 1.0e9;

#define CHECK(arg) check(arg, __LINE__)

int check(int r, int n)
{
    if (r != 0)
        fprintf(stderr, "L%d: %s.\n", n, strerror(-r));
    return(r);
}

void output(char c) {
    static int cnt = 0;
    int n;
    char buf[2];
    buf[0] = c;
    if (cnt == 80) {
        buf[1] = '\n';
        n = 2;
        cnt = 0;
    }
    else {
        n = 1;
        cnt++;
    }
/*   
    CHECK(rt_sem_inquire(&m, &seminfo));
    if (seminfo.count != 0) {
        RT_TASK_INFO taskinfo;
        CHECK(rt_task_inquire(NULL, &taskinfo));
        fprintf(stderr, "ALERT: No lock! (count=%ld) Offending task: %s\n",
                seminfo.count, taskinfo.name);
    }
*/  
    if (write(fd, buf, n) != n) {
        fprintf(stderr, "File write error.\n");
        CHECK( rt_sem_signal(s) );
    }
   
}

static void *task0(void *args) {
   RT_TASK *handler;

   if (!(handler = rt_task_init_schmod(nam2num("T0HDLR"), 0, 0, 0, SCHED_FIFO, 0xF))) {
      printf("CANNOT INIT HANDLER TASK > T0HDLR <\n");
      exit(1);
   }
   rt_allow_nonroot_hrt();
   mlockall(MCL_CURRENT | MCL_FUTURE);
   rt_make_hard_real_time();
   t0end = 0;
   rt_task_use_fpu(handler, TASK_USE_FPU );
   while ( !t0end ) {
       rt_sleep((float)rand()*tmax/(float)RAND_MAX);
       rt_sem_wait(m);
       output('0');
       CHECK( rt_sem_signal(m) );
   }
   rt_make_soft_real_time();
   rt_task_delete(handler);
   return 0;
}

static void *task1(void *args) {
   RT_TASK *handler;
   if (!(handler = rt_task_init_schmod(nam2num("T1HDLR"), 0, 0, 0, SCHED_FIFO, 0xF))) {
      printf("CANNOT INIT HANDLER TASK > T1HDLR <\n");
      exit(1);
   }
   rt_allow_nonroot_hrt();
   mlockall(MCL_CURRENT | MCL_FUTURE);
   rt_make_hard_real_time();
   t1end = 0;
   rt_task_use_fpu(handler, TASK_USE_FPU );
   while ( !t1end ) {
       rt_sleep((float)rand()*tmax/(float)RAND_MAX);
       rt_sem_wait(m);
       output('1');
       CHECK( rt_sem_signal(m) );
   }
   rt_make_soft_real_time();
   rt_task_delete(handler);
   return 0;
}


void sighandler(int arg)
{
    CHECK(rt_sem_signal(s));
}

int main(int argc, char *argv[])
{
   RT_TASK *maint; //, *squaretask;
   int t0, t1;
      
   if ((fd = open("dump.txt", O_CREAT | O_TRUNC | O_WRONLY)) < 0)
        fprintf(stderr, "File open error.\n");
   else {
      if (argc == 2) {
         tmax = atof(argv[1]);
         if (tmax == 0.0)
            tmax = 1.0e7;
      }
      rt_set_oneshot_mode();
      start_rt_timer(0);
      m = rt_sem_init(nam2num("MSEM"), 1);
      s = rt_sem_init(nam2num("SSEM"), 0);
      signal(SIGINT, sighandler);
      if (!(maint = rt_task_init(nam2num("MAIN"), 1, 0, 0))) {
         printf("CANNOT INIT MAIN TASK > MAIN <\n");
         exit(1);
      }
      t0 = rt_thread_create(task0, NULL, 10000);  // create thread
      while (t0end) {   // wait until thread went to hard real time
         usleep(100000);
      }
      t1 = rt_thread_create(task1, NULL, 10000);  // create thread
      while (t1end) {   // wait until thread went to hard real time
         usleep(100000);
      }   
      printf("Running for %.2f seconds.\n", (float)MAXLONG/1.0e9);
   
      rt_sem_wait(s);
   
      signal(SIGINT, SIG_IGN);
      t0end = 1;
      t1end = 1;
      printf("TEST ENDS\n");
      CHECK( rt_thread_join(t0) );
      CHECK( rt_thread_join(t1) );
      CHECK(rt_sem_delete(s));
      CHECK(rt_sem_delete(m));
      CHECK( rt_task_delete(maint) );
       close(fd);
   }
   return 0;
}



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-21 16:47 ` [Xenomai-core] " Hannes Mayer
@ 2006-01-21 17:01   ` Jan Kiszka
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kiszka @ 2006-01-21 17:01 UTC (permalink / raw)
  To: Hannes Mayer; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 1316 bytes --]

Hannes Mayer wrote:
> Jan Kiszka wrote:
> [...]
>> PS: Out of curiosity I also checked RTAI's migration mechanism in this
>> regard. It's similar except for the fact that it does the gatekeeper's
>> work in the Linux scheduler's tail (i.e. after the next context switch).
>> And RTAI seems it suffers from the very same race. So this is either a
>> fundamental issue - or I'm fundamentally wrong.
> 
> 
> Well, most of the stuff you guys talk about in this thread is still
> beyond my level, but out of curiosity I ported the SEM example to
> RTAI (see attached sem.c)
> I couldn't come up with something similar to rt_sem_inquire and
> rt_task_inquire in RTAI (in "void output(char c)")...
> Anyway, unless I haven't missed something else important while
> porting, the example runs flawlessly on RTAI 3.3test3 (kernel 2.6.15).
> 

My claim on the RTAI race is based on quick code analysis and a bit
outdated information about its core design. I haven't tried any code to
crash it, and I guess it will take a slightly different test design to
trigger the issue there. As soon as someone could follow my reasoning
and confirm it (don't mind that you did not understand it, I hadn't
either two days ago, this is quite heavy stuff), I will inform Paolo
about this potential problem.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-21 10:47 [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT Jan Kiszka
  2006-01-21 10:51 ` [Xenomai-core] " Jeroen Van den Keybus
  2006-01-21 16:47 ` [Xenomai-core] " Hannes Mayer
@ 2006-01-22  8:10 ` Dmitry Adamushko
  2006-01-22 16:19   ` Jeroen Van den Keybus
  2006-01-29 23:48 ` Philippe Gerum
  3 siblings, 1 reply; 27+ messages in thread
From: Dmitry Adamushko @ 2006-01-22  8:10 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 4194 bytes --]

> Hi,
>
> well, if I'm not totally wrong, we have a design problem in the
> RT-thread hardening path. I dug into the crash Jeroen reported > and I'm
> quite sure that this is the reason.
>
> So that's the bad news. The good one is that we can at least
> work around
> it by switching off CONFIG_PREEMPT for Linux (this implicitly means that
> it's a 2.6-only issue).
>
>
> But let's start with two assumptions my further analysis is
> based on:
>
> [Xenomai]
>  o Shadow threads have only one stack, i.e. one context. If the
>   real-time part is active (this includes it is blocked on some
> xnsynch object or delayed), the original Linux task must
> NEVER EVER be
>   executed, even if it will immediately fall asleep again. That's
>   because the stack is in use by the real-time part at that time. > And
this condition is checked in do_schedule_event() [1].
>
> [Linux]
>  o A Linux task which has called
> set_current_state(<blocking_bit>) will
>  remain in the run-queue as long as it calls schedule() on its
> own.

Yes, you are right.

Let's keep in mind the following piece of code.

[*]

[code]    from sched.c::schedule()
...
    switch_count = &prev->nivcsw;
    if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {    <--- MUST
BE TRUE FOR A TASK TO BE REMOVED
        switch_count = &prev->nvcsw;
        if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
                unlikely(signal_pending(prev))))
            prev->state = TASK_RUNNING;
        else {
            if (prev->state == TASK_UNINTERRUPTIBLE)
                rq->nr_uninterruptible++;
            deactivate_task(prev, rq);            <--- removing from the
active queue
        }
    }
...
[/code]

On executing schedule(), a "current" (prev = current) task is not removed
from the active queue in one of the following cases:

[1] prev->state == 0, i.e. == TASK_RUNNING (since #define TASK_RUNNING  0);

[2] add_preempt_count(PREEMPT_ACTIVE) has been called before calling
schedule() from the task's context
    i.e. from the context of the "current" task (prev = current in
schedule());

[3] there is a pending signal for the "current" task.

Keeping that in mind too, let's take a look at what happens in your
"crash"-scenario.

> ...
>
> 3) Interruption by some Linux IRQ. This may cause other
> threads to become runnable as well, but the gatekeeper has
> the highest prio and will therefore be the next. The problem is
> that the rescheduling on Linux IRQ exit will PREEMPT our task > in
xnshadow_harden(), it will NOT remove it from the Linux
> run-queue.

Right. But what actually happens is the following sequence of calls:

ret_from_intr ---> resume_kernel ---> need_resched --->
sched.c::preempt_schedule_irq() ---> schedule()        (**)

As a result, schedule() is called indeed but it does not execute the [*]
code -
the "current" task is not removed from the active queue.
The reason is [2] (from the list above) and that's done in
preempt_schedule_irq().

> And now we are in real troubles: The
> gatekeeper will kick off our RT part which will take over the
> thread's stack. As soon as the RT domain falls asleep and
> Linux takes over again, it will continue our non-RT part as well! >
Actually, this seems to be the reason for the panic in
> do_schedule_event(). Without CONFIG_XENO_OPT_DEBUG
> and this check, we will run both parts AT THE SAME
> TIME now, thus violating my first assumption. The system gets > fatally
corrupted.
>
> Well, I would be happy if someone can prove me wrong here.

I'm afraid you are right.


> The problem is that I don't see a solution because Linux does
> not provide an atomic wake-up + schedule-out under
> CONFIG_PREEMPT. I'm
> currently considering a hack to remove the migrating Linux
> thread manually from the run-queue, but this could easily break > the
Linux scheduler.

I have a "stupid" idea on top of my head but I'd prefer to test it on my own
first so not to look as a complete idiot if it's totally wrong. Err... it's
difficult to look more an idiot than I'm already? :o)


> Jan

--
Best regards,
Dmitry Adamushko

[-- Attachment #2: Type: text/html, Size: 5528 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-22  8:10 ` Dmitry Adamushko
@ 2006-01-22 16:19   ` Jeroen Van den Keybus
  2006-01-23 18:22     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 27+ messages in thread
From: Jeroen Van den Keybus @ 2006-01-22 16:19 UTC (permalink / raw)
  To: Dmitry Adamushko; +Cc: Jan Kiszka, xenomai

[-- Attachment #1: Type: text/plain, Size: 1726 bytes --]

Hello,


I'm currently not at a level to participate in your discussion. Although I'm
willing to supply you with stresstests, I would nevertheless like to learn
more from task migration as this debugging session proceeds. In order to do
so, please confirm the following statements or indicate where I went wrong.
I hope others may learn from this as well.

xn_shadow_harden(): This is called whenever a Xenomai thread performs a
Linux (root domain) system call (notified by Adeos ?). The migrating thread
(nRT) is marked INTERRUPTIBLE and run by the Linux kernel
wake_up_interruptible_sync() call. Is this thread actually run or does it
merely put the thread in some Linux to-do list (I assumed the first case) ?
And how does it terminate: is only the system call migrated or is the thread
allowed to continue run (at a priority level equal to the Xenomai
priority level) until it hits something of the Xenomai API (or trivially:
explicitly go to RT using the API) ? In that case, I expect the nRT thread
to terminate with a schedule() call in the Xeno OS API code which
deactivates the task so that it won't ever run in Linux context anymore. A
top priority gatekeeper is in place as a software hook to catch Linux's
attention right after that schedule(), which might otherwise schedule
something else (and leave only interrupts for Xenomai to come back to life
again). I have the impression that I cannot see this gatekeeper, nor the
(n)RT threads using the ps command ?

Is it correct to state that the current preemption issue is due to the
gatekeeper being invoked too soon ? Could someone knowing more about the
migration technology explain what exactly goes wrong ?

Thanks,


Jeroen.

[-- Attachment #2: Type: text/html, Size: 1939 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-22 16:19   ` Jeroen Van den Keybus
@ 2006-01-23 18:22     ` Gilles Chanteperdrix
  2006-01-23 19:16       ` Jan Kiszka
  2006-01-24 13:14       ` Dmitry Adamushko
  0 siblings, 2 replies; 27+ messages in thread
From: Gilles Chanteperdrix @ 2006-01-23 18:22 UTC (permalink / raw)
  To: Jeroen Van den Keybus; +Cc: Jan Kiszka, xenomai

Jeroen Van den Keybus wrote:
 > Hello,
 > 
 > 
 > I'm currently not at a level to participate in your discussion. Although I'm
 > willing to supply you with stresstests, I would nevertheless like to learn
 > more from task migration as this debugging session proceeds. In order to do
 > so, please confirm the following statements or indicate where I went wrong.
 > I hope others may learn from this as well.
 > 
 > xn_shadow_harden(): This is called whenever a Xenomai thread performs a
 > Linux (root domain) system call (notified by Adeos ?). 

xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
 > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
 > wake_up_interruptible_sync() call. Is this thread actually run or does it
 > merely put the thread in some Linux to-do list (I assumed the first case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as
suspended is not needed, since the gatekeeper may have a high priority,
and calling schedule() is enough. In any case, the waken up thread does
not seem to be run immediately, so this rather look like the second
case.

Since in xnshadow_harden, the running thread marks itself as suspended
before running wake_up_interruptible_sync, the gatekeeper will run when
schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
configuration. In the non-preempt case, the current thread will be
suspended and the gatekeeper will run when schedule() is explicitely
called in xnshadow_harden(). In the preempt case, schedule gets called
when the outermost spinlock is unlocked in wake_up_interruptible_sync().

 > And how does it terminate: is only the system call migrated or is the thread
 > allowed to continue run (at a priority level equal to the Xenomai
 > priority level) until it hits something of the Xenomai API (or trivially:
 > explicitly go to RT using the API) ? 

I am not sure I follow you here. The usual case is that the thread will
remain in primary mode after the system call, but I think a system call
flag allow the other behaviour. So, if I understand the question
correctly, the answer is that it depends on the system call.

 > In that case, I expect the nRT thread to terminate with a schedule()
 > call in the Xeno OS API code which deactivates the task so that it
 > won't ever run in Linux context anymore. A top priority gatekeeper is
 > in place as a software hook to catch Linux's attention right after
 > that schedule(), which might otherwise schedule something else (and
 > leave only interrupts for Xenomai to come back to life again).

Here is the way I understand it. We have two threads, or rather two
"views" of the same thread, with each its state. Switching from
secondary to primary mode, i.e. xnshadow_harden and gatekeeper job,
means changing the two states at once. Since we can not do that, we need
an intermediate state. Since the intermediate state can not be the state
where the two threads are running (they share the same stack and
program counter), the intermediate state is a state where the two
threads are suspended, but another context needs running, it is the
gatekeeper.

 >  I have
 > the impression that I cannot see this gatekeeper, nor the (n)RT
 > threads using the ps command ?

The gatekeeper and Xenomai user-space threads are regular Linux
contexts, you can seen them using the ps command.

 > 
 > Is it correct to state that the current preemption issue is due to the
 > gatekeeper being invoked too soon ? Could someone knowing more about the
 > migration technology explain what exactly goes wrong ?

Jan seems to have found such an issue here. I am not sure I understood
what he wrote. But if the issue is due to CONFIG_PREEMPT, it explains
why I could not observe the bug, I only have the "voluntary preempt"
option enabled.

I will now try and activate CONFIG_PREEMPT, so as to try and understand
what Jan wrote, and tell you more later.

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-23 18:22     ` Gilles Chanteperdrix
@ 2006-01-23 19:16       ` Jan Kiszka
  2006-01-30 14:51         ` Philippe Gerum
  2006-01-24 13:14       ` Dmitry Adamushko
  1 sibling, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2006-01-23 19:16 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 5946 bytes --]

Gilles Chanteperdrix wrote:
> Jeroen Van den Keybus wrote:
>  > Hello,
>  > 
>  > 
>  > I'm currently not at a level to participate in your discussion. Although I'm
>  > willing to supply you with stresstests, I would nevertheless like to learn
>  > more from task migration as this debugging session proceeds. In order to do
>  > so, please confirm the following statements or indicate where I went wrong.
>  > I hope others may learn from this as well.
>  > 
>  > xn_shadow_harden(): This is called whenever a Xenomai thread performs a
>  > Linux (root domain) system call (notified by Adeos ?). 
> 
> xnshadow_harden() is called whenever a thread running in secondary
> mode (that is, running as a regular Linux thread, handled by Linux
> scheduler) is switching to primary mode (where it will run as a Xenomai
> thread, handled by Xenomai scheduler). Migrations occur for some system
> calls. More precisely, Xenomai skin system calls tables associates a few
> flags with each system call, and some of these flags cause migration of
> the caller when it issues the system call.
> 
> Each Xenomai user-space thread has two contexts, a regular Linux
> thread context, and a Xenomai thread called "shadow" thread. Both
> contexts share the same stack and program counter, so that at any time,
> at least one of the two contexts is seen as suspended by the scheduler
> which handles it.
> 
> Before xnshadow_harden is called, the Linux thread is running, and its
> shadow is seen in suspended state with XNRELAX bit by Xenomai
> scheduler. After xnshadow_harden, the Linux context is seen suspended
> with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
> running by Xenomai scheduler.
> 
> The migrating thread
>  > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
>  > wake_up_interruptible_sync() call. Is this thread actually run or does it
>  > merely put the thread in some Linux to-do list (I assumed the first case) ?
> 
> Here, I am not sure, but it seems that when calling
> wake_up_interruptible_sync the woken up task is put in the current CPU
> runqueue, and this task (i.e. the gatekeeper), will not run until the
> current thread (i.e. the thread running xnshadow_harden) marks itself as
> suspended and calls schedule(). Maybe, marking the running thread as

Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.

> suspended is not needed, since the gatekeeper may have a high priority,
> and calling schedule() is enough. In any case, the waken up thread does
> not seem to be run immediately, so this rather look like the second
> case.
> 
> Since in xnshadow_harden, the running thread marks itself as suspended
> before running wake_up_interruptible_sync, the gatekeeper will run when
> schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
> configuration. In the non-preempt case, the current thread will be
> suspended and the gatekeeper will run when schedule() is explicitely
> called in xnshadow_harden(). In the preempt case, schedule gets called
> when the outermost spinlock is unlocked in wake_up_interruptible_sync().
> 
>  > And how does it terminate: is only the system call migrated or is the thread
>  > allowed to continue run (at a priority level equal to the Xenomai
>  > priority level) until it hits something of the Xenomai API (or trivially:
>  > explicitly go to RT using the API) ? 
> 
> I am not sure I follow you here. The usual case is that the thread will
> remain in primary mode after the system call, but I think a system call
> flag allow the other behaviour. So, if I understand the question
> correctly, the answer is that it depends on the system call.
> 
>  > In that case, I expect the nRT thread to terminate with a schedule()
>  > call in the Xeno OS API code which deactivates the task so that it
>  > won't ever run in Linux context anymore. A top priority gatekeeper is
>  > in place as a software hook to catch Linux's attention right after
>  > that schedule(), which might otherwise schedule something else (and
>  > leave only interrupts for Xenomai to come back to life again).
> 
> Here is the way I understand it. We have two threads, or rather two
> "views" of the same thread, with each its state. Switching from
> secondary to primary mode, i.e. xnshadow_harden and gatekeeper job,
> means changing the two states at once. Since we can not do that, we need
> an intermediate state. Since the intermediate state can not be the state
> where the two threads are running (they share the same stack and
> program counter), the intermediate state is a state where the two
> threads are suspended, but another context needs running, it is the
> gatekeeper.
> 
>  >  I have
>  > the impression that I cannot see this gatekeeper, nor the (n)RT
>  > threads using the ps command ?
> 
> The gatekeeper and Xenomai user-space threads are regular Linux
> contexts, you can seen them using the ps command.
> 
>  > 
>  > Is it correct to state that the current preemption issue is due to the
>  > gatekeeper being invoked too soon ? Could someone knowing more about the
>  > migration technology explain what exactly goes wrong ?
> 
> Jan seems to have found such an issue here. I am not sure I understood
> what he wrote. But if the issue is due to CONFIG_PREEMPT, it explains
> why I could not observe the bug, I only have the "voluntary preempt"
> option enabled.
> 
> I will now try and activate CONFIG_PREEMPT, so as to try and understand
> what Jan wrote, and tell you more later.
> 

Hardly anyone understands me, it's so sad... ;(

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-23 18:22     ` Gilles Chanteperdrix
  2006-01-23 19:16       ` Jan Kiszka
@ 2006-01-24 13:14       ` Dmitry Adamushko
  2006-01-24 13:26         ` Jan Kiszka
  1 sibling, 1 reply; 27+ messages in thread
From: Dmitry Adamushko @ 2006-01-24 13:14 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai

[-- Attachment #1: Type: text/plain, Size: 2333 bytes --]

On 23/01/06, Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote:
>
> Jeroen Van den Keybus wrote:
> > Hello,



> [ skip-skip-skip ]
>


> Since in xnshadow_harden, the running thread marks itself as suspended
> before running wake_up_interruptible_sync, the gatekeeper will run when
> schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
> configuration. In the non-preempt case, the current thread will be
> suspended and the gatekeeper will run when schedule() is explicitely
> called in xnshadow_harden(). In the preempt case, schedule gets called
> when the outermost spinlock is unlocked in wake_up_interruptible_sync().


In fact, no.

wake_up_interruptible_sync() doesn't set the need_resched "flag" up. That's
why it's "sync" actually.

Only if the need_resched was already set before calling
wake_up_interruptible_sync(), then yes.

The secuence is as follows :

wake_up_interruptible_sync ---> wake_up_sync ---> wake_up_common(...,
sync=1, ...) ---> ... ---> try_to_wake_up(..., sync=1)

Look at the end of  try_to_wake_up() to see when it calls resched_task().
The comment there speaks for itself.

So let's suppose need_resched == 0 (it's per-task of course).
As a result of wake_up_interruptible_sync() the new task is added to the
current active run-queue but need_resched remains to be unset in the hope
that the waker will call schedule() on its own soon.

I have CONFIG_PREEMPT set on my machine but I have never encountered a bug
described by Jan.

The catalyst of the problem,  I guess, is  that some IRQ interrupts a task
between wake_up_interruptible_sync() and schedule() and its ISR, in turn,
wakes up another task which prio is higher than the one of our waker (as a
result, the need_resched flag is set). And now, rescheduling occurs on
return from irq handling code (ret_from_intr -> ...-> preempt_irq_schedule()
-> schedule()).

Some events should coincide, yep. But I guess that problem does not occur
every time?

I have not checked it yet but my presupposition that something as easy as :

preempt_disable()

wake_up_interruptible_sync();
schedule();

preempt_enable();


could work... err.. and don't blame me if no, it's some one else who has
written that nonsense :o)

--
Best regards,
Dmitry Adamushko

[-- Attachment #2: Type: text/html, Size: 3153 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-24 13:14       ` Dmitry Adamushko
@ 2006-01-24 13:26         ` Jan Kiszka
  2006-01-30 11:37           ` Dmitry Adamushko
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2006-01-24 13:26 UTC (permalink / raw)
  To: Dmitry Adamushko; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 3246 bytes --]

Dmitry Adamushko wrote:
> On 23/01/06, Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote:
>> Jeroen Van den Keybus wrote:
>>> Hello,
> 
> 
> 
>> [ skip-skip-skip ]
>>
> 
> 
>> Since in xnshadow_harden, the running thread marks itself as suspended
>> before running wake_up_interruptible_sync, the gatekeeper will run when
>> schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
>> configuration. In the non-preempt case, the current thread will be
>> suspended and the gatekeeper will run when schedule() is explicitely
>> called in xnshadow_harden(). In the preempt case, schedule gets called
>> when the outermost spinlock is unlocked in wake_up_interruptible_sync().
> 
> 
> In fact, no.
> 
> wake_up_interruptible_sync() doesn't set the need_resched "flag" up. That's
> why it's "sync" actually.
> 
> Only if the need_resched was already set before calling
> wake_up_interruptible_sync(), then yes.
> 
> The secuence is as follows :
> 
> wake_up_interruptible_sync ---> wake_up_sync ---> wake_up_common(...,
> sync=1, ...) ---> ... ---> try_to_wake_up(..., sync=1)
> 
> Look at the end of  try_to_wake_up() to see when it calls resched_task().
> The comment there speaks for itself.
> 
> So let's suppose need_resched == 0 (it's per-task of course).
> As a result of wake_up_interruptible_sync() the new task is added to the
> current active run-queue but need_resched remains to be unset in the hope
> that the waker will call schedule() on its own soon.
> 
> I have CONFIG_PREEMPT set on my machine but I have never encountered a bug
> described by Jan.
> 
> The catalyst of the problem,  I guess, is  that some IRQ interrupts a task
> between wake_up_interruptible_sync() and schedule() and its ISR, in turn,
> wakes up another task which prio is higher than the one of our waker (as a
> result, the need_resched flag is set). And now, rescheduling occurs on
> return from irq handling code (ret_from_intr -> ...-> preempt_irq_schedule()
> -> schedule()).

Yes, this is exactly what happened. I unfortunately have not saved a
related trace I took with the extended ipipe-tracer (the one I sent ends
too early), but they showed a preemption right after the wake_up, first
by one of the other real-time threads in Jeroen's scenario, and then, as
a result of some xnshadow_relax() of that thread, a Linux
preempt_schedule to the gatekeeper. We do not see this bug that often as
it requires a specific load and it must hit a really small race window.

> 
> Some events should coincide, yep. But I guess that problem does not occur
> every time?
> 
> I have not checked it yet but my presupposition that something as easy as :
> 
> preempt_disable()
> 
> wake_up_interruptible_sync();
> schedule();
> 
> preempt_enable();

It's a no-go: "scheduling while atomic". One of my first attempts to
solve it.

The only way to enter schedule() without being preemptible is via
ACTIVE_PREEMPT. But the effect of that flag should be well-known now.
Kind of Gordian knot. :(

> 
> 
> could work... err.. and don't blame me if no, it's some one else who has
> written that nonsense :o)
> 
> --
> Best regards,
> Dmitry Adamushko
> 

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-21 10:47 [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT Jan Kiszka
                   ` (2 preceding siblings ...)
  2006-01-22  8:10 ` Dmitry Adamushko
@ 2006-01-29 23:48 ` Philippe Gerum
  2006-01-30 10:14   ` Philippe Gerum
  3 siblings, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2006-01-29 23:48 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
> Hi,
> 
> well, if I'm not totally wrong, we have a design problem in the
> RT-thread hardening path. I dug into the crash Jeroen reported and I'm
> quite sure that this is the reason.
> 
> So that's the bad news. The good one is that we can at least work around
> it by switching off CONFIG_PREEMPT for Linux (this implicitly means that
> it's a 2.6-only issue).
> 
> @Jeroen: Did you verify that your setup also works fine without
> CONFIG_PREEMPT?
> 
> But let's start with two assumptions my further analysis is based on:
> 
> [Xenomai]
>  o Shadow threads have only one stack, i.e. one context. If the
>    real-time part is active (this includes it is blocked on some xnsynch
>    object or delayed), the original Linux task must NEVER EVER be
>    executed, even if it will immediately fall asleep again. That's
>    because the stack is in use by the real-time part at that time. And
>    this condition is checked in do_schedule_event() [1].
> 
> [Linux]
>  o A Linux task which has called set_current_state(<blocking_bit>) will
>    remain in the run-queue as long as it calls schedule() on its own.
>    This means that it can be preempted (if CONFIG_PREEMPT is set)
>    between set_current_state() and schedule() and then even be resumed
>    again. Only the explicit call of schedule() will trigger
>    deactivate_task() which will in turn remove current from the
>    run-queue.
> 
> Ok, if this is true, let's have a look at xnshadow_harden(): After
> grabbing the gatekeeper sem and putting itself in gk->thread, a task
> going for RT then marks itself TASK_INTERRUPTIBLE and wakes up the
> gatekeeper [2]. This does not include a Linux reschedule due to the
> _sync version of wake_up_interruptible. What can happen now?
> 
> 1) No interruption until we can called schedule() [3]. All fine as we
> will not be removed from the run-queue before the gatekeeper starts
> kicking our RT part, thus no conflict in using the thread's stack.
> 
> 3) Interruption by a RT IRQ. This would just delay the path described
> above, even if some RT threads get executed. Once they are finished, we
> continue in xnshadow_harden() - given that the RT part does not trigger
> the following case:
> 
> 3) Interruption by some Linux IRQ. This may cause other threads to
> become runnable as well, but the gatekeeper has the highest prio and
> will therefore be the next. The problem is that the rescheduling on
> Linux IRQ exit will PREEMPT our task in xnshadow_harden(), it will NOT
> remove it from the Linux run-queue. And now we are in real troubles: The
> gatekeeper will kick off our RT part which will take over the thread's
> stack. As soon as the RT domain falls asleep and Linux takes over again,
> it will continue our non-RT part as well! Actually, this seems to be the
> reason for the panic in do_schedule_event(). Without
> CONFIG_XENO_OPT_DEBUG and this check, we will run both parts AT THE SAME
> TIME now, thus violating my first assumption. The system gets fatally
> corrupted.
>

Yep, that's it. And we may not lock out the interrupts before calling schedule to 
prevent that.

> Well, I would be happy if someone can prove me wrong here.
> 
> The problem is that I don't see a solution because Linux does not
> provide an atomic wake-up + schedule-out under CONFIG_PREEMPT. I'm
> currently considering a hack to remove the migrating Linux thread
> manually from the run-queue, but this could easily break the Linux
> scheduler.
> 

Maybe the best way would be to provide atomic wakeup-and-schedule support into the 
Adeos patch for Linux tasks; previous attempts to fix this by circumventing the 
potential for preemption from outside of the scheduler code have all failed, and 
this bug is uselessly lingering for that reason.

> Jan
> 
> 
> PS: Out of curiosity I also checked RTAI's migration mechanism in this
> regard. It's similar except for the fact that it does the gatekeeper's
> work in the Linux scheduler's tail (i.e. after the next context switch).
> And RTAI seems it suffers from the very same race. So this is either a
> fundamental issue - or I'm fundamentally wrong.
> 
> 
> [1]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L1573
> [2]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L461
> [3]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L481
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core


-- 

Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-29 23:48 ` Philippe Gerum
@ 2006-01-30 10:14   ` Philippe Gerum
  0 siblings, 0 replies; 27+ messages in thread
From: Philippe Gerum @ 2006-01-30 10:14 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, xenomai-core

Philippe Gerum wrote:
> Jan Kiszka wrote:
> 
>> Hi,
>>
>> well, if I'm not totally wrong, we have a design problem in the
>> RT-thread hardening path. I dug into the crash Jeroen reported and I'm
>> quite sure that this is the reason.
>>
>> So that's the bad news. The good one is that we can at least work around
>> it by switching off CONFIG_PREEMPT for Linux (this implicitly means that
>> it's a 2.6-only issue).
>>
>> @Jeroen: Did you verify that your setup also works fine without
>> CONFIG_PREEMPT?
>>
>> But let's start with two assumptions my further analysis is based on:
>>
>> [Xenomai]
>>  o Shadow threads have only one stack, i.e. one context. If the
>>    real-time part is active (this includes it is blocked on some xnsynch
>>    object or delayed), the original Linux task must NEVER EVER be
>>    executed, even if it will immediately fall asleep again. That's
>>    because the stack is in use by the real-time part at that time. And
>>    this condition is checked in do_schedule_event() [1].
>>
>> [Linux]
>>  o A Linux task which has called set_current_state(<blocking_bit>) will
>>    remain in the run-queue as long as it calls schedule() on its own.
>>    This means that it can be preempted (if CONFIG_PREEMPT is set)
>>    between set_current_state() and schedule() and then even be resumed
>>    again. Only the explicit call of schedule() will trigger
>>    deactivate_task() which will in turn remove current from the
>>    run-queue.
>>
>> Ok, if this is true, let's have a look at xnshadow_harden(): After
>> grabbing the gatekeeper sem and putting itself in gk->thread, a task
>> going for RT then marks itself TASK_INTERRUPTIBLE and wakes up the
>> gatekeeper [2]. This does not include a Linux reschedule due to the
>> _sync version of wake_up_interruptible. What can happen now?
>>
>> 1) No interruption until we can called schedule() [3]. All fine as we
>> will not be removed from the run-queue before the gatekeeper starts
>> kicking our RT part, thus no conflict in using the thread's stack.
>>
>> 3) Interruption by a RT IRQ. This would just delay the path described
>> above, even if some RT threads get executed. Once they are finished, we
>> continue in xnshadow_harden() - given that the RT part does not trigger
>> the following case:
>>
>> 3) Interruption by some Linux IRQ. This may cause other threads to
>> become runnable as well, but the gatekeeper has the highest prio and
>> will therefore be the next. The problem is that the rescheduling on
>> Linux IRQ exit will PREEMPT our task in xnshadow_harden(), it will NOT
>> remove it from the Linux run-queue. And now we are in real troubles: The
>> gatekeeper will kick off our RT part which will take over the thread's
>> stack. As soon as the RT domain falls asleep and Linux takes over again,
>> it will continue our non-RT part as well! Actually, this seems to be the
>> reason for the panic in do_schedule_event(). Without
>> CONFIG_XENO_OPT_DEBUG and this check, we will run both parts AT THE SAME
>> TIME now, thus violating my first assumption. The system gets fatally
>> corrupted.
>>
> 
> Yep, that's it. And we may not lock out the interrupts before calling 
> schedule to prevent that.
> 
>> Well, I would be happy if someone can prove me wrong here.
>>
>> The problem is that I don't see a solution because Linux does not
>> provide an atomic wake-up + schedule-out under CONFIG_PREEMPT. I'm
>> currently considering a hack to remove the migrating Linux thread
>> manually from the run-queue, but this could easily break the Linux
>> scheduler.
>>
> 
> Maybe the best way would be to provide atomic wakeup-and-schedule 
> support into the Adeos patch for Linux tasks; previous attempts to fix 
> this by circumventing the potential for preemption from outside of the 
> scheduler code have all failed, and this bug is uselessly lingering for 
> that reason.

Having slept on this, I'm going to add a simple extension to the Linux scheduler 
available from Adeos, in order to get an atomic/unpreemptable path from the 
statement when the current task's state is changed for suspension (e.g. 
TASK_INTERRUPTIBLE), to the point where schedule() normally enters its atomic 
section, which looks like the sanest way to solve this issue, i.e. without gory 
hackery all over the place. Patch will follow later for testing this approach.

> 
>> Jan
>>
>>
>> PS: Out of curiosity I also checked RTAI's migration mechanism in this
>> regard. It's similar except for the fact that it does the gatekeeper's
>> work in the Linux scheduler's tail (i.e. after the next context switch).
>> And RTAI seems it suffers from the very same race. So this is either a
>> fundamental issue - or I'm fundamentally wrong.
>>
>>
>> [1]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L1573 
>>
>> [2]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L461 
>>
>> [3]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L481 
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Xenomai-core mailing list
>> Xenomai-core@domain.hid
>> https://mail.gna.org/listinfo/xenomai-core
> 
> 
> 


-- 

Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-24 13:26         ` Jan Kiszka
@ 2006-01-30 11:37           ` Dmitry Adamushko
  2006-01-30 11:48             ` Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Adamushko @ 2006-01-30 11:37 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 1170 bytes --]

>> ...

> I have not checked it yet but my presupposition that something as easy as
> :
> >
> > preempt_disable()
> >
> > wake_up_interruptible_sync();
> > schedule();
> >
> > preempt_enable();
>
> It's a no-go: "scheduling while atomic". One of my first attempts to
> solve it.


My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call
schedule() while being non-preemptible.
To this end, ACTIVE_PREEMPT is set up.
The use of preempt_enable/disable() here is wrong.


The only way to enter schedule() without being preemptible is via
> ACTIVE_PREEMPT. But the effect of that flag should be well-known now.
> Kind of Gordian knot. :(


Maybe I have missed something so just for my curiosity : what does prevent
the use of PREEMPT_ACTIVE here?
We don't have a "preempted while atomic" message here as it seems to be a
legal way to call schedule() with that flag being set up.


>
> >
> > could work... err.. and don't blame me if no, it's some one else who has
> > written that nonsense :o)
> >
> > --
> > Best regards,
> > Dmitry Adamushko
> >
>
> Jan
>
>
>
>


--
Best regards,
Dmitry Adamushko

[-- Attachment #2: Type: text/html, Size: 1780 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-30 11:37           ` Dmitry Adamushko
@ 2006-01-30 11:48             ` Jan Kiszka
  2006-01-30 13:02               ` Dmitry Adamushko
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2006-01-30 11:48 UTC (permalink / raw)
  To: Dmitry Adamushko; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 1611 bytes --]

Dmitry Adamushko wrote:
>>> ...
> 
>> I have not checked it yet but my presupposition that something as easy as
>> :
>>> preempt_disable()
>>>
>>> wake_up_interruptible_sync();
>>> schedule();
>>>
>>> preempt_enable();
>> It's a no-go: "scheduling while atomic". One of my first attempts to
>> solve it.
> 
> 
> My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call
> schedule() while being non-preemptible.
> To this end, ACTIVE_PREEMPT is set up.
> The use of preempt_enable/disable() here is wrong.
> 
> 
> The only way to enter schedule() without being preemptible is via
>> ACTIVE_PREEMPT. But the effect of that flag should be well-known now.
>> Kind of Gordian knot. :(
> 
> 
> Maybe I have missed something so just for my curiosity : what does prevent
> the use of PREEMPT_ACTIVE here?
> We don't have a "preempted while atomic" message here as it seems to be a
> legal way to call schedule() with that flag being set up.

When PREEMPT_ACTIVE is set, task gets /preempted/ but not removed from
the run queue - independent of its current status.

> 
> 
>>> could work... err.. and don't blame me if no, it's some one else who has
>>> written that nonsense :o)
>>>
>>> --
>>> Best regards,
>>> Dmitry Adamushko
>>>
>> Jan
>>
>>
>>
>>
> 
> 
> --
> Best regards,
> Dmitry Adamushko
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-30 11:48             ` Jan Kiszka
@ 2006-01-30 13:02               ` Dmitry Adamushko
  0 siblings, 0 replies; 27+ messages in thread
From: Dmitry Adamushko @ 2006-01-30 13:02 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 3268 bytes --]

On 30/01/06, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>
> Dmitry Adamushko wrote:
> >>> ...
> >
> >> I have not checked it yet but my presupposition that something as easy
> as
> >> :
> >>> preempt_disable()
> >>>
> >>> wake_up_interruptible_sync();
> >>> schedule();
> >>>
> >>> preempt_enable();
> >> It's a no-go: "scheduling while atomic". One of my first attempts to
> >> solve it.
> >
> >
> > My fault. I meant the way preempt_schedule() and preempt_irq_schedule()
> call
> > schedule() while being non-preemptible.
> > To this end, ACTIVE_PREEMPT is set up.
> > The use of preempt_enable/disable() here is wrong.
> >
> >
> > The only way to enter schedule() without being preemptible is via
> >> ACTIVE_PREEMPT. But the effect of that flag should be well-known now.
> >> Kind of Gordian knot. :(
> >
> >
> > Maybe I have missed something so just for my curiosity : what does
> prevent
> > the use of PREEMPT_ACTIVE here?
> > We don't have a "preempted while atomic" message here as it seems to be
> a
> > legal way to call schedule() with that flag being set up.
>
> When PREEMPT_ACTIVE is set, task gets /preempted/ but not removed from
> the run queue - independent of its current status.


Err...  that's exactly the reason I have explained in my first mail for this
thread :) Blah.. I wish I was smoking something special before so I would
point that as the reason of my forgetfulness.

Actually, we could use PREEMPT_ACTIVE indeed + something else (probably
another flag) to distinguish between a case when PREEMPT_ACTIVE is set by
Linux and another case when it's set by xnshadow_harden().

xnshadow_harden()
{
struct task_struct *this_task = current;
...
xnthread_t *thread = xnshadow_thread(this_task);

if (!thread)
    return;

...
gk->thread = thread;

+ add_preempt_count(PREEMPT_ACTIVE);

// should be checked in schedule()
+ xnthread_set_flags(thread, XNATOMIC_TRANSIT);

set_current_state(TASK_INTERRUPTIBLE);
wake_up_interruptible_sync(&gk->waitq);
+ schedule();

+ sub_preempt_count(PREEMPT_ACTIVE);
...
}

Then, something like the following code should be called from schedule() :

void ipipe_transit_cleanup(struct task_struct *task, runqueue_t *rq)
{
xnthread_t *thread = xnshadow_thread(task);

if (!thread)
    return;

if (xnthread_test_flags(thread, XNATOMIC_TRANSIT))
    {
    xnthread_clear_flags(thread, XNATOMIC_TRANSIT);
    deactivate_task(task, rq);
    }
}

-----

schedule.c :
...
    switch_count = &prev->nivcsw;
    if (prev->state && !(preempt_count() & PREEMPT_ACTIVE))
switch_count = &prev->nvcsw;
        if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
                unlikely(signal_pending(prev)) ))
            prev->state = TASK_RUNNING;
        else {
            if (prev->state == TASK_UNINTERRUPTIBLE)
                rq->nr_uninterruptible++;
            deactivate_task(prev, rq);
        }
    }

// removes a task from the active queue if PREEMPT_ACTIVE + //
XNATOMIC_TRANSIT

+ #ifdef CONFIG_IPIPE
+ ipipe_transit_cleanup(prev, rq);
+ #endif /* CONFIG_IPIPE */
...

 Not very gracefully maybe, but could work or am I missing something
important?

--
Best regards,
Dmitry Adamushko

[-- Attachment #2: Type: text/html, Size: 4629 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-23 19:16       ` Jan Kiszka
@ 2006-01-30 14:51         ` Philippe Gerum
  2006-01-30 15:33           ` Philippe Gerum
  2006-01-30 15:35           ` Philippe Gerum
  0 siblings, 2 replies; 27+ messages in thread
From: Philippe Gerum @ 2006-01-30 14:51 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
> 
>>Jeroen Van den Keybus wrote:
>> > Hello,
>> > 
>> > 
>> > I'm currently not at a level to participate in your discussion. Although I'm
>> > willing to supply you with stresstests, I would nevertheless like to learn
>> > more from task migration as this debugging session proceeds. In order to do
>> > so, please confirm the following statements or indicate where I went wrong.
>> > I hope others may learn from this as well.
>> > 
>> > xn_shadow_harden(): This is called whenever a Xenomai thread performs a
>> > Linux (root domain) system call (notified by Adeos ?). 
>>
>>xnshadow_harden() is called whenever a thread running in secondary
>>mode (that is, running as a regular Linux thread, handled by Linux
>>scheduler) is switching to primary mode (where it will run as a Xenomai
>>thread, handled by Xenomai scheduler). Migrations occur for some system
>>calls. More precisely, Xenomai skin system calls tables associates a few
>>flags with each system call, and some of these flags cause migration of
>>the caller when it issues the system call.
>>
>>Each Xenomai user-space thread has two contexts, a regular Linux
>>thread context, and a Xenomai thread called "shadow" thread. Both
>>contexts share the same stack and program counter, so that at any time,
>>at least one of the two contexts is seen as suspended by the scheduler
>>which handles it.
>>
>>Before xnshadow_harden is called, the Linux thread is running, and its
>>shadow is seen in suspended state with XNRELAX bit by Xenomai
>>scheduler. After xnshadow_harden, the Linux context is seen suspended
>>with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
>>running by Xenomai scheduler.
>>
>>The migrating thread
>> > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
>> > wake_up_interruptible_sync() call. Is this thread actually run or does it
>> > merely put the thread in some Linux to-do list (I assumed the first case) ?
>>
>>Here, I am not sure, but it seems that when calling
>>wake_up_interruptible_sync the woken up task is put in the current CPU
>>runqueue, and this task (i.e. the gatekeeper), will not run until the
>>current thread (i.e. the thread running xnshadow_harden) marks itself as
>>suspended and calls schedule(). Maybe, marking the running thread as
> 
> 
> Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
> here - and a switch if the prio of the woken up task is higher.
> 
> BTW, an easy way to enforce the current trouble is to remove the "_sync"
> from wake_up_interruptible. As I understand it this _sync is just an
> optimisation hint for Linux to avoid needless scheduler runs.
> 

You could not guarantee the following execution sequence doing so either, i.e.

1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be unlinked from 
the Linux runqueue before the gatekeeper processes the resumption request, 
whatever event the kernel is processing asynchronously in the meantime. This is 
the reason why, as you already noticed, preempt_schedule_irq() nicely breaks our 
toy by stealing the CPU from the hardening thread whilst keeping it linked to the 
runqueue: upon return from such preemption, the gatekeeper might have run already, 
  hence the newly hardened thread ends up being seen as runnable by both the Linux 
and Xeno schedulers. Rainy day indeed.

We could rely on giving "current" the highest SCHED_FIFO priority in 
xnshadow_harden() before waking up the gk, until the gk eventually promotes it to 
the Xenomai scheduling mode and downgrades this priority back to normal, but we 
would pay additional latencies induced by each aborted rescheduling attempt that 
may occur during the atomic path we want to enforce.

The other way is to make sure that no in-kernel preemption of the hardening task 
could occur after step 1) and until step 2) is performed, given that we cannot 
currently call schedule() with interrupts or preemption off. I'm on it.

> 
>>suspended is not needed, since the gatekeeper may have a high priority,
>>and calling schedule() is enough. In any case, the waken up thread does
>>not seem to be run immediately, so this rather look like the second
>>case.
>>
>>Since in xnshadow_harden, the running thread marks itself as suspended
>>before running wake_up_interruptible_sync, the gatekeeper will run when
>>schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
>>configuration. In the non-preempt case, the current thread will be
>>suspended and the gatekeeper will run when schedule() is explicitely
>>called in xnshadow_harden(). In the preempt case, schedule gets called
>>when the outermost spinlock is unlocked in wake_up_interruptible_sync().
>>
>> > And how does it terminate: is only the system call migrated or is the thread
>> > allowed to continue run (at a priority level equal to the Xenomai
>> > priority level) until it hits something of the Xenomai API (or trivially:
>> > explicitly go to RT using the API) ? 
>>
>>I am not sure I follow you here. The usual case is that the thread will
>>remain in primary mode after the system call, but I think a system call
>>flag allow the other behaviour. So, if I understand the question
>>correctly, the answer is that it depends on the system call.
>>
>> > In that case, I expect the nRT thread to terminate with a schedule()
>> > call in the Xeno OS API code which deactivates the task so that it
>> > won't ever run in Linux context anymore. A top priority gatekeeper is
>> > in place as a software hook to catch Linux's attention right after
>> > that schedule(), which might otherwise schedule something else (and
>> > leave only interrupts for Xenomai to come back to life again).
>>
>>Here is the way I understand it. We have two threads, or rather two
>>"views" of the same thread, with each its state. Switching from
>>secondary to primary mode, i.e. xnshadow_harden and gatekeeper job,
>>means changing the two states at once. Since we can not do that, we need
>>an intermediate state. Since the intermediate state can not be the state
>>where the two threads are running (they share the same stack and
>>program counter), the intermediate state is a state where the two
>>threads are suspended, but another context needs running, it is the
>>gatekeeper.
>>
>> >  I have
>> > the impression that I cannot see this gatekeeper, nor the (n)RT
>> > threads using the ps command ?
>>
>>The gatekeeper and Xenomai user-space threads are regular Linux
>>contexts, you can seen them using the ps command.
>>
>> > 
>> > Is it correct to state that the current preemption issue is due to the
>> > gatekeeper being invoked too soon ? Could someone knowing more about the
>> > migration technology explain what exactly goes wrong ?
>>
>>Jan seems to have found such an issue here. I am not sure I understood
>>what he wrote. But if the issue is due to CONFIG_PREEMPT, it explains
>>why I could not observe the bug, I only have the "voluntary preempt"
>>option enabled.
>>
>>I will now try and activate CONFIG_PREEMPT, so as to try and understand
>>what Jan wrote, and tell you more later.
>>
> 
> 
> Hardly anyone understands me, it's so sad... ;(
> 
> Jan
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core


-- 

Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-30 14:51         ` Philippe Gerum
@ 2006-01-30 15:33           ` Philippe Gerum
  2006-01-30 16:01             ` Jan Kiszka
  2006-01-30 15:35           ` Philippe Gerum
  1 sibling, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2006-01-30 15:33 UTC (permalink / raw)
  To: xenomai; +Cc: Jan Kiszka

Philippe Gerum wrote:
> Jan Kiszka wrote:
> 
>> Gilles Chanteperdrix wrote:
>>
>>> Jeroen Van den Keybus wrote:
>>> > Hello,
>>> > > > I'm currently not at a level to participate in your discussion. 
>>> Although I'm
>>> > willing to supply you with stresstests, I would nevertheless like 
>>> to learn
>>> > more from task migration as this debugging session proceeds. In 
>>> order to do
>>> > so, please confirm the following statements or indicate where I 
>>> went wrong.
>>> > I hope others may learn from this as well.
>>> > > xn_shadow_harden(): This is called whenever a Xenomai thread 
>>> performs a
>>> > Linux (root domain) system call (notified by Adeos ?).
>>> xnshadow_harden() is called whenever a thread running in secondary
>>> mode (that is, running as a regular Linux thread, handled by Linux
>>> scheduler) is switching to primary mode (where it will run as a Xenomai
>>> thread, handled by Xenomai scheduler). Migrations occur for some system
>>> calls. More precisely, Xenomai skin system calls tables associates a few
>>> flags with each system call, and some of these flags cause migration of
>>> the caller when it issues the system call.
>>>
>>> Each Xenomai user-space thread has two contexts, a regular Linux
>>> thread context, and a Xenomai thread called "shadow" thread. Both
>>> contexts share the same stack and program counter, so that at any time,
>>> at least one of the two contexts is seen as suspended by the scheduler
>>> which handles it.
>>>
>>> Before xnshadow_harden is called, the Linux thread is running, and its
>>> shadow is seen in suspended state with XNRELAX bit by Xenomai
>>> scheduler. After xnshadow_harden, the Linux context is seen suspended
>>> with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
>>> running by Xenomai scheduler.
>>>
>>> The migrating thread
>>> > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
>>> > wake_up_interruptible_sync() call. Is this thread actually run or 
>>> does it
>>> > merely put the thread in some Linux to-do list (I assumed the first 
>>> case) ?
>>>
>>> Here, I am not sure, but it seems that when calling
>>> wake_up_interruptible_sync the woken up task is put in the current CPU
>>> runqueue, and this task (i.e. the gatekeeper), will not run until the
>>> current thread (i.e. the thread running xnshadow_harden) marks itself as
>>> suspended and calls schedule(). Maybe, marking the running thread as
>>
>>
>>
>> Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
>> here - and a switch if the prio of the woken up task is higher.
>>
>> BTW, an easy way to enforce the current trouble is to remove the "_sync"
>> from wake_up_interruptible. As I understand it this _sync is just an
>> optimisation hint for Linux to avoid needless scheduler runs.
>>
> 
> You could not guarantee the following execution sequence doing so 
> either, i.e.
> 
> 1- current wakes up the gatekeeper
> 2- current goes sleeping to exit the Linux runqueue in schedule()
> 3- the gatekeeper resumes the shadow-side of the old current
> 
> The point is all about making 100% sure that current is going to be 
> unlinked from the Linux runqueue before the gatekeeper processes the 
> resumption request, whatever event the kernel is processing 
> asynchronously in the meantime. This is the reason why, as you already 
> noticed, preempt_schedule_irq() nicely breaks our toy by stealing the 
> CPU from the hardening thread whilst keeping it linked to the runqueue: 
> upon return from such preemption, the gatekeeper might have run already, 
>  hence the newly hardened thread ends up being seen as runnable by both 
> the Linux and Xeno schedulers. Rainy day indeed.
> 
> We could rely on giving "current" the highest SCHED_FIFO priority in 
> xnshadow_harden() before waking up the gk, until the gk eventually 
> promotes it to the Xenomai scheduling mode and downgrades this priority 
> back to normal, but we would pay additional latencies induced by each 
> aborted rescheduling attempt that may occur during the atomic path we 
> want to enforce.
> 
> The other way is to make sure that no in-kernel preemption of the 
> hardening task could occur after step 1) and until step 2) is performed, 
> given that we cannot currently call schedule() with interrupts or 
> preemption off. I'm on it.
> 

Could anyone interested in this issue test the following couple of patches?

atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for 2.6.15
atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2

Both patches are needed to fix the issue.

TIA,

-- 

Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-30 14:51         ` Philippe Gerum
  2006-01-30 15:33           ` Philippe Gerum
@ 2006-01-30 15:35           ` Philippe Gerum
  2006-01-31 21:09             ` Jeroen Van den Keybus
  1 sibling, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2006-01-30 15:35 UTC (permalink / raw)
  To: xenomai; +Cc: Jan Kiszka

[-- Attachment #1: Type: text/plain, Size: 4694 bytes --]

Philippe Gerum wrote:
> Jan Kiszka wrote:
> 
>> Gilles Chanteperdrix wrote:
>>
>>> Jeroen Van den Keybus wrote:
>>> > Hello,
>>> > > > I'm currently not at a level to participate in your discussion. 
>>> Although I'm
>>> > willing to supply you with stresstests, I would nevertheless like 
>>> to learn
>>> > more from task migration as this debugging session proceeds. In 
>>> order to do
>>> > so, please confirm the following statements or indicate where I 
>>> went wrong.
>>> > I hope others may learn from this as well.
>>> > > xn_shadow_harden(): This is called whenever a Xenomai thread 
>>> performs a
>>> > Linux (root domain) system call (notified by Adeos ?).
>>> xnshadow_harden() is called whenever a thread running in secondary
>>> mode (that is, running as a regular Linux thread, handled by Linux
>>> scheduler) is switching to primary mode (where it will run as a Xenomai
>>> thread, handled by Xenomai scheduler). Migrations occur for some system
>>> calls. More precisely, Xenomai skin system calls tables associates a few
>>> flags with each system call, and some of these flags cause migration of
>>> the caller when it issues the system call.
>>>
>>> Each Xenomai user-space thread has two contexts, a regular Linux
>>> thread context, and a Xenomai thread called "shadow" thread. Both
>>> contexts share the same stack and program counter, so that at any time,
>>> at least one of the two contexts is seen as suspended by the scheduler
>>> which handles it.
>>>
>>> Before xnshadow_harden is called, the Linux thread is running, and its
>>> shadow is seen in suspended state with XNRELAX bit by Xenomai
>>> scheduler. After xnshadow_harden, the Linux context is seen suspended
>>> with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
>>> running by Xenomai scheduler.
>>>
>>> The migrating thread
>>> > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
>>> > wake_up_interruptible_sync() call. Is this thread actually run or 
>>> does it
>>> > merely put the thread in some Linux to-do list (I assumed the first 
>>> case) ?
>>>
>>> Here, I am not sure, but it seems that when calling
>>> wake_up_interruptible_sync the woken up task is put in the current CPU
>>> runqueue, and this task (i.e. the gatekeeper), will not run until the
>>> current thread (i.e. the thread running xnshadow_harden) marks itself as
>>> suspended and calls schedule(). Maybe, marking the running thread as
>>
>>
>>
>> Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
>> here - and a switch if the prio of the woken up task is higher.
>>
>> BTW, an easy way to enforce the current trouble is to remove the "_sync"
>> from wake_up_interruptible. As I understand it this _sync is just an
>> optimisation hint for Linux to avoid needless scheduler runs.
>>
> 
> You could not guarantee the following execution sequence doing so 
> either, i.e.
> 
> 1- current wakes up the gatekeeper
> 2- current goes sleeping to exit the Linux runqueue in schedule()
> 3- the gatekeeper resumes the shadow-side of the old current
> 
> The point is all about making 100% sure that current is going to be 
> unlinked from the Linux runqueue before the gatekeeper processes the 
> resumption request, whatever event the kernel is processing 
> asynchronously in the meantime. This is the reason why, as you already 
> noticed, preempt_schedule_irq() nicely breaks our toy by stealing the 
> CPU from the hardening thread whilst keeping it linked to the runqueue: 
> upon return from such preemption, the gatekeeper might have run already, 
>  hence the newly hardened thread ends up being seen as runnable by both 
> the Linux and Xeno schedulers. Rainy day indeed.
> 
> We could rely on giving "current" the highest SCHED_FIFO priority in 
> xnshadow_harden() before waking up the gk, until the gk eventually 
> promotes it to the Xenomai scheduling mode and downgrades this priority 
> back to normal, but we would pay additional latencies induced by each 
> aborted rescheduling attempt that may occur during the atomic path we 
> want to enforce.
> 
> The other way is to make sure that no in-kernel preemption of the 
> hardening task could occur after step 1) and until step 2) is performed, 
> given that we cannot currently call schedule() with interrupts or 
> preemption off. I'm on it.
> 

 > Could anyone interested in this issue test the following couple of patches?

 > atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for 2.6.15
 > atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2

 > Both patches are needed to fix the issue.

 > TIA,

And now, Ladies and Gentlemen, with the patches attached.

-- 

Philippe.


[-- Attachment #2: atomic-switch-state.patch --]
[-- Type: text/x-patch, Size: 1348 bytes --]

--- 2.6.15-x86/kernel/sched.c	2006-01-07 15:18:31.000000000 +0100
+++ 2.6.15-ipipe/kernel/sched.c	2006-01-30 15:15:27.000000000 +0100
@@ -2963,7 +2963,7 @@
 	 * Otherwise, whine if we are scheduling when we should not be.
 	 */
 	if (likely(!current->exit_state)) {
-		if (unlikely(in_atomic())) {
+		if (unlikely(!(current->state & TASK_ATOMICSWITCH) && in_atomic())) {
 			printk(KERN_ERR "scheduling while atomic: "
 				"%s/0x%08x/%d\n",
 				current->comm, preempt_count(), current->pid);
@@ -2972,8 +2972,13 @@
 	}
 	profile_hit(SCHED_PROFILING, __builtin_return_address(0));
 
+	if (unlikely(current->state & TASK_ATOMICSWITCH)) {
+		current->state &= ~TASK_ATOMICSWITCH;
+		goto preemption_off;
+	}
 need_resched:
 	preempt_disable();
+preemption_off:
 #ifdef CONFIG_IPIPE
 	if (unlikely(ipipe_current_domain != ipipe_root_domain)) {
 		preempt_enable();
--- 2.6.15-x86/include/linux/sched.h	2006-01-07 15:18:31.000000000 +0100
+++ 2.6.15-ipipe/include/linux/sched.h	2006-01-30 15:14:43.000000000 +0100
@@ -128,6 +128,11 @@
 #define EXIT_DEAD		32
 /* in tsk->state again */
 #define TASK_NONINTERACTIVE	64
+#ifdef CONFIG_IPIPE
+#define TASK_ATOMICSWITCH	512
+#else  /* !CONFIG_IPIPE */
+#define TASK_ATOMICSWITCH	0
+#endif /* CONFIG_IPIPE */
 
 #define __set_task_state(tsk, state_value)		\
 	do { (tsk)->state = (state_value); } while (0)

[-- Attachment #3: atomic-wakeup-and-schedule.patch --]
[-- Type: text/x-patch, Size: 4489 bytes --]

Index: include/asm-generic/wrappers.h
===================================================================
--- include/asm-generic/wrappers.h	(revision 487)
+++ include/asm-generic/wrappers.h	(working copy)
@@ -60,6 +60,10 @@
 /* Sched */
 #define MAX_RT_PRIO 100
 #define task_cpu(p) ((p)->processor)
+#ifndef CONFIG_PREEMPT
+#define preempt_disable()  do { } while(0)
+#define preempt_enable()   do { } while(0)
+#endif /* CONFIG_PREEMPT */
 
 /* Signals */
 #define wrap_sighand_lock(p)     ((p)->sigmask_lock)
Index: include/asm-generic/hal.h
===================================================================
--- include/asm-generic/hal.h	(revision 487)
+++ include/asm-generic/hal.h	(working copy)
@@ -216,6 +216,11 @@
 #define IPIPE_EVENT_SELF  0
 #endif /* !IPIPE_EVENT_SELF */
 
+#ifndef TASK_ATOMICSWITCH
+/* Some early I-pipe versions don't have this either. */
+#define TASK_ATOMICSWITCH  0
+#endif /* !TASK_ATOMICSWITCH */
+
 #define rthal_catch_taskexit(hdlr)	\
     ipipe_catch_event(ipipe_root_domain,IPIPE_EVENT_EXIT,hdlr)
 #define rthal_catch_sigwake(hdlr)	\
Index: ksrc/nucleus/shadow.c
===================================================================
--- ksrc/nucleus/shadow.c	(revision 487)
+++ ksrc/nucleus/shadow.c	(working copy)
@@ -453,50 +453,33 @@
        the Xenomai domain. This will cause the shadow thread to resume
        using the register state of the current Linux task. For this to
        happen, we set up the migration data, prepare to suspend the
-       current task then wake up the gatekeeper which will perform the
-       actual transition. */
+       current task, wake up the gatekeeper which will perform the
+       actual transition, then schedule out. Most of this sequence
+       must be atomic, and we get this guarantee by disabling
+       preemption and using the TASK_ATOMICSWITCH cumulative state
+       provided by Adeos to Linux tasks. */
 
     gk->thread = thread;
-    set_current_state(TASK_INTERRUPTIBLE);
+    preempt_disable();
+    set_current_state(TASK_INTERRUPTIBLE|TASK_ATOMICSWITCH);
     wake_up_interruptible_sync(&gk->waitq);
+    schedule();
 
-    if (rthal_current_domain == rthal_root_domain) {
+    /* Rare case: we might have been awaken by a signal before the
+       gatekeeper sent us to primary mode. Since TASK_UNINTERRUPTIBLE
+       is unavailable to us without wrecking the runqueue's count of
+       uniniterruptible tasks, we just notice the issue and gracefully
+       fail; the caller will have to process this signal anyway. */
 
-        /* On non-preemptible kernels, we always enter this code,
-	   since there is no preemption opportunity before we
-	   explicitely call schedule(). On preemptible kernels, we
-	   might have been switched out on our way in/out
-	   wake_up_interruptible_sync(), and scheduled back after the
-	   gatekeeper kicked the Xenomai scheduler. In such a case, we
-	   need to check the current Adeos domain: if this is Xenomai,
-	   then the switch has already taken place and the current
-	   task is already running in primary mode; if it's not, then
-	   we need to call schedule() in order to force the current
-	   task out and let the gatekeeper switch us back in primary
-	   mode. The small race window between the test and the call
-	   to schedule() is closed by the latter routine, which denies
-	   rescheduling over non-root domains (I-pipe patches >=
-	   1.0-08 for ppc, or 1.0-12 for x86). */
-
-	schedule();
-
-	/* Rare case: we might have been awaken by a signal before the
-	   gatekeeper sent us to primary mode. Since
-	   TASK_UNINTERRUPTIBLE is unavailable to us without wrecking
-	   the runqueue's count of uniniterruptible tasks, we just
-	   notice the issue and gracefully fail; the caller will have
-	   to process this signal anyway. */
-
-	if (rthal_current_domain == rthal_root_domain) {
+    if (rthal_current_domain == rthal_root_domain) {
 #ifdef CONFIG_XENO_OPT_DEBUG
-	    if (!signal_pending(this_task) ||
-		this_task->state != TASK_RUNNING)
-		xnpod_fatal("xnshadow_harden() failed for thread %s[%d]",
-			    thread->name,
-			    xnthread_user_pid(thread));
+    	if (!signal_pending(this_task) ||
+	    this_task->state != TASK_RUNNING)
+	    xnpod_fatal("xnshadow_harden() failed for thread %s[%d]",
+			thread->name,
+			xnthread_user_pid(thread));
 #endif /* CONFIG_XENO_OPT_DEBUG */
-	    return -ERESTARTSYS;
-	}
+	return -ERESTARTSYS;
     }
 
     /* "current" is now running into the Xenomai domain. */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-30 15:33           ` Philippe Gerum
@ 2006-01-30 16:01             ` Jan Kiszka
  2006-01-30 23:10               ` Philippe Gerum
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2006-01-30 16:01 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 5135 bytes --]

Philippe Gerum wrote:
> Philippe Gerum wrote:
>> Jan Kiszka wrote:
>>
>>> Gilles Chanteperdrix wrote:
>>>
>>>> Jeroen Van den Keybus wrote:
>>>> > Hello,
>>>> > > > I'm currently not at a level to participate in your
>>>> discussion. Although I'm
>>>> > willing to supply you with stresstests, I would nevertheless like
>>>> to learn
>>>> > more from task migration as this debugging session proceeds. In
>>>> order to do
>>>> > so, please confirm the following statements or indicate where I
>>>> went wrong.
>>>> > I hope others may learn from this as well.
>>>> > > xn_shadow_harden(): This is called whenever a Xenomai thread
>>>> performs a
>>>> > Linux (root domain) system call (notified by Adeos ?).
>>>> xnshadow_harden() is called whenever a thread running in secondary
>>>> mode (that is, running as a regular Linux thread, handled by Linux
>>>> scheduler) is switching to primary mode (where it will run as a Xenomai
>>>> thread, handled by Xenomai scheduler). Migrations occur for some system
>>>> calls. More precisely, Xenomai skin system calls tables associates a
>>>> few
>>>> flags with each system call, and some of these flags cause migration of
>>>> the caller when it issues the system call.
>>>>
>>>> Each Xenomai user-space thread has two contexts, a regular Linux
>>>> thread context, and a Xenomai thread called "shadow" thread. Both
>>>> contexts share the same stack and program counter, so that at any time,
>>>> at least one of the two contexts is seen as suspended by the scheduler
>>>> which handles it.
>>>>
>>>> Before xnshadow_harden is called, the Linux thread is running, and its
>>>> shadow is seen in suspended state with XNRELAX bit by Xenomai
>>>> scheduler. After xnshadow_harden, the Linux context is seen suspended
>>>> with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
>>>> running by Xenomai scheduler.
>>>>
>>>> The migrating thread
>>>> > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
>>>> > wake_up_interruptible_sync() call. Is this thread actually run or
>>>> does it
>>>> > merely put the thread in some Linux to-do list (I assumed the
>>>> first case) ?
>>>>
>>>> Here, I am not sure, but it seems that when calling
>>>> wake_up_interruptible_sync the woken up task is put in the current CPU
>>>> runqueue, and this task (i.e. the gatekeeper), will not run until the
>>>> current thread (i.e. the thread running xnshadow_harden) marks
>>>> itself as
>>>> suspended and calls schedule(). Maybe, marking the running thread as
>>>
>>>
>>>
>>> Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
>>> here - and a switch if the prio of the woken up task is higher.
>>>
>>> BTW, an easy way to enforce the current trouble is to remove the "_sync"
>>> from wake_up_interruptible. As I understand it this _sync is just an
>>> optimisation hint for Linux to avoid needless scheduler runs.
>>>
>>
>> You could not guarantee the following execution sequence doing so
>> either, i.e.
>>
>> 1- current wakes up the gatekeeper
>> 2- current goes sleeping to exit the Linux runqueue in schedule()
>> 3- the gatekeeper resumes the shadow-side of the old current
>>
>> The point is all about making 100% sure that current is going to be
>> unlinked from the Linux runqueue before the gatekeeper processes the
>> resumption request, whatever event the kernel is processing
>> asynchronously in the meantime. This is the reason why, as you already
>> noticed, preempt_schedule_irq() nicely breaks our toy by stealing the
>> CPU from the hardening thread whilst keeping it linked to the
>> runqueue: upon return from such preemption, the gatekeeper might have
>> run already,  hence the newly hardened thread ends up being seen as
>> runnable by both the Linux and Xeno schedulers. Rainy day indeed.
>>
>> We could rely on giving "current" the highest SCHED_FIFO priority in
>> xnshadow_harden() before waking up the gk, until the gk eventually
>> promotes it to the Xenomai scheduling mode and downgrades this
>> priority back to normal, but we would pay additional latencies induced
>> by each aborted rescheduling attempt that may occur during the atomic
>> path we want to enforce.
>>
>> The other way is to make sure that no in-kernel preemption of the
>> hardening task could occur after step 1) and until step 2) is
>> performed, given that we cannot currently call schedule() with
>> interrupts or preemption off. I'm on it.
>>
> 
> Could anyone interested in this issue test the following couple of patches?
> 
> atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for
> 2.6.15
> atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2
> 
> Both patches are needed to fix the issue.
> 
> TIA,
> 

Looks good. I tried Jeroen's test-case and I was not able to reproduce
the crash anymore. I think it's time for a new ipipe-release. ;)

At this chance: any comments on the panic-freeze extension for the
tracer? I need to rework the Xenomai patch, but the ipipe side should be
ready for merge.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-30 16:01             ` Jan Kiszka
@ 2006-01-30 23:10               ` Philippe Gerum
  2006-01-31 19:01                 ` Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2006-01-30 23:10 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Jan Kiszka wrote:
> Philippe Gerum wrote:
> 
>>Philippe Gerum wrote:
>>
>>>Jan Kiszka wrote:
>>>
>>>
>>>>Gilles Chanteperdrix wrote:
>>>>
>>>>
>>>>>Jeroen Van den Keybus wrote:
>>>>>
>>>>>>Hello,
>>>>>>
>>>>>>>>I'm currently not at a level to participate in your
>>>>>
>>>>>discussion. Although I'm
>>>>>
>>>>>>willing to supply you with stresstests, I would nevertheless like
>>>>>
>>>>>to learn
>>>>>
>>>>>>more from task migration as this debugging session proceeds. In
>>>>>
>>>>>order to do
>>>>>
>>>>>>so, please confirm the following statements or indicate where I
>>>>>
>>>>>went wrong.
>>>>>
>>>>>>I hope others may learn from this as well.
>>>>>>
>>>>>>>xn_shadow_harden(): This is called whenever a Xenomai thread
>>>>>
>>>>>performs a
>>>>>
>>>>>>Linux (root domain) system call (notified by Adeos ?).
>>>>>
>>>>>xnshadow_harden() is called whenever a thread running in secondary
>>>>>mode (that is, running as a regular Linux thread, handled by Linux
>>>>>scheduler) is switching to primary mode (where it will run as a Xenomai
>>>>>thread, handled by Xenomai scheduler). Migrations occur for some system
>>>>>calls. More precisely, Xenomai skin system calls tables associates a
>>>>>few
>>>>>flags with each system call, and some of these flags cause migration of
>>>>>the caller when it issues the system call.
>>>>>
>>>>>Each Xenomai user-space thread has two contexts, a regular Linux
>>>>>thread context, and a Xenomai thread called "shadow" thread. Both
>>>>>contexts share the same stack and program counter, so that at any time,
>>>>>at least one of the two contexts is seen as suspended by the scheduler
>>>>>which handles it.
>>>>>
>>>>>Before xnshadow_harden is called, the Linux thread is running, and its
>>>>>shadow is seen in suspended state with XNRELAX bit by Xenomai
>>>>>scheduler. After xnshadow_harden, the Linux context is seen suspended
>>>>>with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
>>>>>running by Xenomai scheduler.
>>>>>
>>>>>The migrating thread
>>>>>
>>>>>>(nRT) is marked INTERRUPTIBLE and run by the Linux kernel
>>>>>>wake_up_interruptible_sync() call. Is this thread actually run or
>>>>>
>>>>>does it
>>>>>
>>>>>>merely put the thread in some Linux to-do list (I assumed the
>>>>>
>>>>>first case) ?
>>>>>
>>>>>Here, I am not sure, but it seems that when calling
>>>>>wake_up_interruptible_sync the woken up task is put in the current CPU
>>>>>runqueue, and this task (i.e. the gatekeeper), will not run until the
>>>>>current thread (i.e. the thread running xnshadow_harden) marks
>>>>>itself as
>>>>>suspended and calls schedule(). Maybe, marking the running thread as
>>>>
>>>>
>>>>
>>>>Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
>>>>here - and a switch if the prio of the woken up task is higher.
>>>>
>>>>BTW, an easy way to enforce the current trouble is to remove the "_sync"
>>>>from wake_up_interruptible. As I understand it this _sync is just an
>>>>optimisation hint for Linux to avoid needless scheduler runs.
>>>>
>>>
>>>You could not guarantee the following execution sequence doing so
>>>either, i.e.
>>>
>>>1- current wakes up the gatekeeper
>>>2- current goes sleeping to exit the Linux runqueue in schedule()
>>>3- the gatekeeper resumes the shadow-side of the old current
>>>
>>>The point is all about making 100% sure that current is going to be
>>>unlinked from the Linux runqueue before the gatekeeper processes the
>>>resumption request, whatever event the kernel is processing
>>>asynchronously in the meantime. This is the reason why, as you already
>>>noticed, preempt_schedule_irq() nicely breaks our toy by stealing the
>>>CPU from the hardening thread whilst keeping it linked to the
>>>runqueue: upon return from such preemption, the gatekeeper might have
>>>run already,  hence the newly hardened thread ends up being seen as
>>>runnable by both the Linux and Xeno schedulers. Rainy day indeed.
>>>
>>>We could rely on giving "current" the highest SCHED_FIFO priority in
>>>xnshadow_harden() before waking up the gk, until the gk eventually
>>>promotes it to the Xenomai scheduling mode and downgrades this
>>>priority back to normal, but we would pay additional latencies induced
>>>by each aborted rescheduling attempt that may occur during the atomic
>>>path we want to enforce.
>>>
>>>The other way is to make sure that no in-kernel preemption of the
>>>hardening task could occur after step 1) and until step 2) is
>>>performed, given that we cannot currently call schedule() with
>>>interrupts or preemption off. I'm on it.
>>>
>>
>>Could anyone interested in this issue test the following couple of patches?
>>
>>atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for
>>2.6.15
>>atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2
>>
>>Both patches are needed to fix the issue.
>>
>>TIA,
>>
> 
> 
> Looks good. I tried Jeroen's test-case and I was not able to reproduce
> the crash anymore. I think it's time for a new ipipe-release. ;)
>

Looks like, indeed.

> At this chance: any comments on the panic-freeze extension for the
> tracer? I need to rework the Xenomai patch, but the ipipe side should be
> ready for merge.
> 

No issue with the ipipe side since it only touches the tracer support code. No 
issue either at first sight with the Xeno side, aside of the trace being frozen 
twice in do_schedule_event? (once in this routine, twice in xnpod_fatal); but 
maybe it's wanted to freeze the situation before the stack is dumped; is it?

I'm queuing the ipipe side patch for 1.2, which will also provide the support we 
need for atomic scheduling in order to solve the migration bug.

-- 

Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-30 23:10               ` Philippe Gerum
@ 2006-01-31 19:01                 ` Jan Kiszka
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kiszka @ 2006-01-31 19:01 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 936 bytes --]

Philippe Gerum wrote:
> Jan Kiszka wrote:
>> At this chance: any comments on the panic-freeze extension for the
>> tracer? I need to rework the Xenomai patch, but the ipipe side should be
>> ready for merge.
>>
> 
> No issue with the ipipe side since it only touches the tracer support
> code. No issue either at first sight with the Xeno side, aside of the
> trace being frozen twice in do_schedule_event? (once in this routine,
> twice in xnpod_fatal); but maybe it's wanted to freeze the situation
> before the stack is dumped; is it?

Yes, this is the reason for it. Actually, only the first freeze has any
effect, later calls will be ignored.

Hmm, I though to remember some issue of the Xenomai-side patch when
tracing was disabled but I cannot reproduce this issue again (was likely
related to other hacks while tracking down the PREEMPT issue). So from
my POV that patch is ready for merge as well.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-30 15:35           ` Philippe Gerum
@ 2006-01-31 21:09             ` Jeroen Van den Keybus
  2006-01-31 21:45               ` Philippe Gerum
  0 siblings, 1 reply; 27+ messages in thread
From: Jeroen Van den Keybus @ 2006-01-31 21:09 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, xenomai

[-- Attachment #1: Type: text/plain, Size: 248 bytes --]

>
> And now, Ladies and Gentlemen, with the patches attached.


I've installed both patches and the problem seems to have disappeared. I'll
try it on another machine tomorrow, too. Meanwhile: thanks very much for the
assistance !

Jeroen.

[-- Attachment #2: Type: text/html, Size: 431 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-31 21:09             ` Jeroen Van den Keybus
@ 2006-01-31 21:45               ` Philippe Gerum
  2006-02-01  9:57                 ` Jeroen Van den Keybus
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2006-01-31 21:45 UTC (permalink / raw)
  To: Jeroen Van den Keybus; +Cc: Jan Kiszka, xenomai

Jeroen Van den Keybus wrote:
>     And now, Ladies and Gentlemen, with the patches attached.
> 
>  
> I've installed both patches and the problem seems to have disappeared. 
> I'll try it on another machine tomorrow, too. Meanwhile: thanks very 
> much for the assistance !
>

Actually, the effort you made to provide a streamlined testcase that triggered the 
bug did most of the job, so you are the one to thank here. The rest was only a 
matter of dealing with my own bugs, which is a sisyphean activity I'm rather 
familiar with.

> Jeroen.
>  


-- 

Philippe.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-01-31 21:45               ` Philippe Gerum
@ 2006-02-01  9:57                 ` Jeroen Van den Keybus
  2006-02-01 10:03                   ` Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Jeroen Van den Keybus @ 2006-02-01  9:57 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, xenomai

[-- Attachment #1: Type: text/plain, Size: 475 bytes --]

>
> > I've installed both patches and the problem seems to have disappeared.
> > I'll try it on another machine tomorrow, too. Meanwhile: thanks very
> > much for the assistance !


While testing more thoroughly, my triggers for zero mutex values after
acquiring the lock are going off again. I was using the SVN xenomai
development tree, but I've now switched to the (fixed) 2.1-rc2 in order to
apply the patches. Is Jan's bugfix included in that one ?

Jeroen.

[-- Attachment #2: Type: text/html, Size: 713 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-02-01  9:57                 ` Jeroen Van den Keybus
@ 2006-02-01 10:03                   ` Jan Kiszka
  2006-02-01 12:23                     ` Jeroen Van den Keybus
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2006-02-01 10:03 UTC (permalink / raw)
  To: Jeroen Van den Keybus; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 612 bytes --]

Jeroen Van den Keybus wrote:
>>> I've installed both patches and the problem seems to have disappeared.
>>> I'll try it on another machine tomorrow, too. Meanwhile: thanks very
>>> much for the assistance !
> 
> 
> While testing more thoroughly, my triggers for zero mutex values after
> acquiring the lock are going off again. I was using the SVN xenomai
> development tree, but I've now switched to the (fixed) 2.1-rc2 in order to
> apply the patches. Is Jan's bugfix included in that one ?

Revision 466 contains the mutex-info fix, but that is post -rc2. Why not
switching to SVN head?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-02-01 10:03                   ` Jan Kiszka
@ 2006-02-01 12:23                     ` Jeroen Van den Keybus
  2006-02-01 12:34                       ` Jan Kiszka
  0 siblings, 1 reply; 27+ messages in thread
From: Jeroen Van den Keybus @ 2006-02-01 12:23 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 422 bytes --]

>
> Revision 466 contains the mutex-info fix, but that is post -rc2. Why not
> switching to SVN head?


Philippe asked to apply the patch against Xenomai 2.1-rc2. Can I safely
patch it against the SVN tree ? After that, what will 'svn up' do to the
patched tree ?

Remember I'm quite new to Linux. Actually, I spent half an hour finding out
how that patch stuff (especially the -p option) works.



Jeroen.

[-- Attachment #2: Type: text/html, Size: 633 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
  2006-02-01 12:23                     ` Jeroen Van den Keybus
@ 2006-02-01 12:34                       ` Jan Kiszka
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Kiszka @ 2006-02-01 12:34 UTC (permalink / raw)
  To: Jeroen Van den Keybus; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 886 bytes --]

Jeroen Van den Keybus wrote:
>> Revision 466 contains the mutex-info fix, but that is post -rc2. Why not
>> switching to SVN head?
> 
> 
> Philippe asked to apply the patch against Xenomai 2.1-rc2. Can I safely
> patch it against the SVN tree ? After that, what will 'svn up' do to the
> patched tree ?

The CONFIG_PREEMPT fix is already contained in the latest SVN revision,
no need to patch anymore.

When unsure if a patch will cleanly apply, try "patch --dry-run" first.
(Virtually) rejected hunks can then be used to asses if the patch fits -
without messing up the code base immediately.

> 
> Remember I'm quite new to Linux. Actually, I spent half an hour finding out
> how that patch stuff (especially the -p option) works.
> 

:) (it's no problem to ask even these kind of "stupid" questions to the
list or us directly - no one will bite you!)

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2006-02-01 12:34 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-21 10:47 [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT Jan Kiszka
2006-01-21 10:51 ` [Xenomai-core] " Jeroen Van den Keybus
2006-01-21 16:47 ` [Xenomai-core] " Hannes Mayer
2006-01-21 17:01   ` Jan Kiszka
2006-01-22  8:10 ` Dmitry Adamushko
2006-01-22 16:19   ` Jeroen Van den Keybus
2006-01-23 18:22     ` Gilles Chanteperdrix
2006-01-23 19:16       ` Jan Kiszka
2006-01-30 14:51         ` Philippe Gerum
2006-01-30 15:33           ` Philippe Gerum
2006-01-30 16:01             ` Jan Kiszka
2006-01-30 23:10               ` Philippe Gerum
2006-01-31 19:01                 ` Jan Kiszka
2006-01-30 15:35           ` Philippe Gerum
2006-01-31 21:09             ` Jeroen Van den Keybus
2006-01-31 21:45               ` Philippe Gerum
2006-02-01  9:57                 ` Jeroen Van den Keybus
2006-02-01 10:03                   ` Jan Kiszka
2006-02-01 12:23                     ` Jeroen Van den Keybus
2006-02-01 12:34                       ` Jan Kiszka
2006-01-24 13:14       ` Dmitry Adamushko
2006-01-24 13:26         ` Jan Kiszka
2006-01-30 11:37           ` Dmitry Adamushko
2006-01-30 11:48             ` Jan Kiszka
2006-01-30 13:02               ` Dmitry Adamushko
2006-01-29 23:48 ` Philippe Gerum
2006-01-30 10:14   ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.