[Xenomai-help] Handling Linux Signals in primary domain context

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai-help] Handling Linux Signals in primary domain context
@ 2010-06-01 13:50 Tschaeche IT-Services
  2010-06-01 13:52 ` Gilles Chanteperdrix
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Tschaeche IT-Services @ 2010-06-01 13:50 UTC (permalink / raw)
  To: xenomai

Hi,

we have the following scenario:

A high priority periodic primary domain task (H), which calls
rt_task_suspend(L) in each even period and rt_task_resume(L)
in each odd period on a low priority primary domain task (L).
L-task consumes all available CPU resources (while(1)).
Thus, the rest of each cycle (after H has got the CPU) is used
alternately by L-task, ROOT-task, L-task,...

In our debugging implementation, we send a SIGTRAP to L-task.
H-task recognizes this by reporting EINTR when calling rt_task_suspend(L).
But, the while(1) in L-task is not interrupted although there is a SIGTRAP
pending.

Our workaround could be, to send a rt_signal when rt_task_suspend()
returns EINTR and, then, in the rt-signal handler migrate L-task
to secondary domain (calling rt_task_set_mode(T_PRIMARY,0))
initiating the Linux scheduler, which, then, initiates the SIGTRAP handling
in secondary domain context.

Is there a simpler way to get primary domain tasks interrupted
by Linux signals? Xenomai already knows about the pending signal
and, maybe, could initiate the secondary domain switch on a primary scheduler
event.

Thanks,

	Olli

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-01 13:50 [Xenomai-help] Handling Linux Signals in primary domain context Tschaeche IT-Services
@ 2010-06-01 13:52 ` Gilles Chanteperdrix
  2010-06-01 13:59 ` Gilles Chanteperdrix
  2010-06-01 14:32 ` Philippe Gerum
  2 siblings, 0 replies; 27+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-01 13:52 UTC (permalink / raw)
  To: Tschaeche IT-Services; +Cc: xenomai

Tschaeche IT-Services wrote:
> Hi,
> 
> we have the following scenario:
> 
> A high priority periodic primary domain task (H), which calls
> rt_task_suspend(L) in each even period and rt_task_resume(L)
> in each odd period on a low priority primary domain task (L).
> L-task consumes all available CPU resources (while(1)).
> Thus, the rest of each cycle (after H has got the CPU) is used
> alternately by L-task, ROOT-task, L-task,...
> 
> In our debugging implementation, we send a SIGTRAP to L-task.
> H-task recognizes this by reporting EINTR when calling rt_task_suspend(L).
> But, the while(1) in L-task is not interrupted although there is a SIGTRAP
> pending.
> 
> Our workaround could be, to send a rt_signal when rt_task_suspend()
> returns EINTR and, then, in the rt-signal handler migrate L-task
> to secondary domain (calling rt_task_set_mode(T_PRIMARY,0))
> initiating the Linux scheduler, which, then, initiates the SIGTRAP handling
> in secondary domain context.
> 
> Is there a simpler way to get primary domain tasks interrupted
> by Linux signals? Xenomai already knows about the pending signal
> and, maybe, could initiate the secondary domain switch on a primary scheduler
> event.

Could you send us a self-contained minimal program which exhibits this
behaviour?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-01 13:50 [Xenomai-help] Handling Linux Signals in primary domain context Tschaeche IT-Services
  2010-06-01 13:52 ` Gilles Chanteperdrix
@ 2010-06-01 13:59 ` Gilles Chanteperdrix
  2010-06-01 14:32 ` Philippe Gerum
  2 siblings, 0 replies; 27+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-01 13:59 UTC (permalink / raw)
  To: Tschaeche IT-Services; +Cc: xenomai

Tschaeche IT-Services wrote:
> Hi,
> 
> we have the following scenario:
> 
> A high priority periodic primary domain task (H), which calls
> rt_task_suspend(L) in each even period and rt_task_resume(L)
> in each odd period on a low priority primary domain task (L).
> L-task consumes all available CPU resources (while(1)).
> Thus, the rest of each cycle (after H has got the CPU) is used
> alternately by L-task, ROOT-task, L-task,...
> 
> In our debugging implementation, we send a SIGTRAP to L-task.
> H-task recognizes this by reporting EINTR when calling rt_task_suspend(L).
> But, the while(1) in L-task is not interrupted although there is a SIGTRAP
> pending.

That is expected, the automatic migration from primary mode to secondary
mode when recieving a signal only works if the task emits syscall.

> 
> Our workaround could be, to send a rt_signal when rt_task_suspend()
> returns EINTR and, then, in the rt-signal handler migrate L-task
> to secondary domain (calling rt_task_set_mode(T_PRIMARY,0))
> initiating the Linux scheduler, which, then, initiates the SIGTRAP handling
> in secondary domain context.
> 
> Is there a simpler way to get primary domain tasks interrupted
> by Linux signals? Xenomai already knows about the pending signal
> and, maybe, could initiate the secondary domain switch on a primary scheduler
> event.

Xenomai tasks are expected to emit system calls from time to time. I am
afraid your use case is kind of out of what Xenomai was made for.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-01 13:50 [Xenomai-help] Handling Linux Signals in primary domain context Tschaeche IT-Services
  2010-06-01 13:52 ` Gilles Chanteperdrix
  2010-06-01 13:59 ` Gilles Chanteperdrix
@ 2010-06-01 14:32 ` Philippe Gerum
  2010-06-01 15:54   ` Tschaeche IT-Services
  2 siblings, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2010-06-01 14:32 UTC (permalink / raw)
  To: Tschaeche IT-Services; +Cc: xenomai

On Tue, 2010-06-01 at 15:50 +0200, Tschaeche IT-Services wrote:
> Hi,
> 
> we have the following scenario:
> 
> A high priority periodic primary domain task (H), which calls
> rt_task_suspend(L) in each even period and rt_task_resume(L)
> in each odd period on a low priority primary domain task (L).
> L-task consumes all available CPU resources (while(1)).
> Thus, the rest of each cycle (after H has got the CPU) is used
> alternately by L-task, ROOT-task, L-task,...
> 
> In our debugging implementation, we send a SIGTRAP to L-task.
> H-task recognizes this by reporting EINTR when calling rt_task_suspend(L).
> But, the while(1) in L-task is not interrupted although there is a SIGTRAP
> pending.

Using SIGTRAP will badly conflict with GDB. Hope this is ok.

> 
> Our workaround could be, to send a rt_signal when rt_task_suspend()
> returns EINTR and, then, in the rt-signal handler migrate L-task
> to secondary domain (calling rt_task_set_mode(T_PRIMARY,0))
> initiating the Linux scheduler, which, then, initiates the SIGTRAP handling
> in secondary domain context.
> 
> Is there a simpler way to get primary domain tasks interrupted
> by Linux signals? Xenomai already knows about the pending signal
> and, maybe, could initiate the secondary domain switch on a primary scheduler
> event.

Not in the absence of syscall. We thought about this once already, when
considering how a watchdog preempting a runaway task in primary mode
could force a secondary mode switch: there is no sane and easy solution
to this unfortunately.

If the basic idea is about throttling the activity of the L-task, then
you could use the sporadic server policy (enabled via
pthread_setschedparam_ex()).

> 
> Thanks,
> 
> 	Olli
> 
> _______________________________________________
> Xenomai-help mailing list
> Xenomai-help@domain.hid
> https://mail.gna.org/listinfo/xenomai-help


-- 
Philippe.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-01 14:32 ` Philippe Gerum
@ 2010-06-01 15:54   ` Tschaeche IT-Services
  2010-06-01 16:52     ` Tschaeche IT-Services
  2010-06-01 16:58     ` Jan Kiszka
  0 siblings, 2 replies; 27+ messages in thread
From: Tschaeche IT-Services @ 2010-06-01 15:54 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 1308 bytes --]

On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
> Not in the absence of syscall. We thought about this once already, when
> considering how a watchdog preempting a runaway task in primary mode
> could force a secondary mode switch: there is no sane and easy solution
> to this unfortunately.

This is exactly Sigmatek's problem: Our customers develop code
within our debugging/development environment. We want to catch
this situation (the developer implements a while(1)) with a
watchdog throwing SIGTRAP so that our debugger gets active
and can locate the problem according to the stack frame...

Find attached a separated test case (using SIGTERM which
should terminate the application). When pressing space
the system freezes (work_l() is in the while() loop with
pending signal and work_h() does not rt_task_suspend()
anymore (returning EINTR).

Then, we implement a workaround sending a rt-signal
when rt_task_suspend() returns EINTR. In the rt-signal
handler we explicitely migrate the task to secondary
domain, where linux signal handling is triggered...

Thanks,

	Olli

-- 
Tschaeche IT-Services       Tel.:  +49/9134/9089850
Dr.-Ing. Oliver Tschäche    Mobil: +49/176/20435601
Welluckenweg 4              Email: services@domain.hid
91077 Neunkirchen

[-- Attachment #2: signal2xenomai.c --]
[-- Type: text/x-csrc, Size: 1996 bytes --]

/* compile with gcc -Wall -D_GNU_SOURCE -lpthread -o thisfile thisfile.c */
/*
 * Simple test app to show pthread api mutex behaviour.
 *
 * This is compared against the Xenomai native api behaviour.
 * See mutex_xeno_native.c
 */
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/mman.h>		/* Needed for mlockall() */
#include <limits.h>
#include <pthread.h>

#include "native/task.h"

#define MY_STACK_SIZE (100*1024)      /* 100 kB is enough for now. */

static pthread_t ph, pl;
static RT_TASK xh, xl;
static volatile int state = 0;

void *
work_h(void *cookie)
{
	if (rt_task_shadow(&xh, "high", 50, 0)) {
		printf("failed to shadow high\n");
		return NULL;
	}
	if (rt_task_set_periodic(&xh, TM_NOW, 1000000)) {
		printf("failed to set high periodic\n");
		return NULL;
	}
	while (1) {
		if (rt_task_wait_period(NULL)) {
			printf("wait_period failed\n");
			break;
		}

		switch (state) {
			case -1:
				if (rt_task_suspend(&xl)) {
					/* work around??? */
				}
				state = 1;
				break;
			case 1:
				rt_task_resume(&xl);
				state = -1;
				break;
			default:
				break;
		}
	}
	return NULL;
}

void *
work_l(void *cookie)
{
	if (rt_task_shadow(&xl, "low", 25, 0)) {
		printf("failed to shadow low\n");
		return NULL;
	}
	if (rt_task_set_mode(0, T_PRIMARY, NULL)) {
		printf("failed to migrate low\n");
		return NULL;
	}
	state = -1;
	while (1)
		;
	return NULL;
}

int main(void)
{
	pthread_attr_t threadattr;
	mlockall(MCL_CURRENT | MCL_FUTURE);

	pthread_attr_init(&threadattr);
	pthread_attr_setstacksize(&threadattr, MY_STACK_SIZE);

	pthread_create(&ph, &threadattr, work_h, NULL);
	printf("high prio watchdog started\n");
	pthread_create(&pl, &threadattr, work_l, NULL);
	printf("low prio work started\n");

	printf("Press <ENTER> to send a signal\n");
	getc(stdin);

	pthread_kill(pl, SIGTERM);

	/* you will not get here, because work_l() eats up your CPU */
	printf("Press <ENTER> to finish\n");
	getc(stdin);

	printf("main finished\n");
	return 0;
}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-01 15:54   ` Tschaeche IT-Services
@ 2010-06-01 16:52     ` Tschaeche IT-Services
  2010-06-01 16:58     ` Jan Kiszka
  1 sibling, 0 replies; 27+ messages in thread
From: Tschaeche IT-Services @ 2010-06-01 16:52 UTC (permalink / raw)
  To: Tschaeche IT-Services; +Cc: xenomai

On Tue, Jun 01, 2010 at 05:54:04PM +0200, Tschaeche IT-Services wrote:
> Then, we implement a workaround sending a rt-signal
> when rt_task_suspend() returns EINTR. In the rt-signal
> handler we explicitely migrate the task to secondary
> domain, where linux signal handling is triggered...

this does not work: rt_task_catch() is only allowed for kernel based tasks :-(
Is there any other possibility to interrupt the task and switch it to
secondary domain?

Thanks,

	Olli


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-01 15:54   ` Tschaeche IT-Services
  2010-06-01 16:52     ` Tschaeche IT-Services
@ 2010-06-01 16:58     ` Jan Kiszka
  2010-06-02  8:36       ` Gilles Chanteperdrix
  1 sibling, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2010-06-01 16:58 UTC (permalink / raw)
  To: Tschaeche IT-Services; +Cc: xenomai

Tschaeche IT-Services wrote:
> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
>> Not in the absence of syscall. We thought about this once already, when
>> considering how a watchdog preempting a runaway task in primary mode
>> could force a secondary mode switch: there is no sane and easy solution
>> to this unfortunately.
> 
> This is exactly Sigmatek's problem: Our customers develop code
> within our debugging/development environment. We want to catch
> this situation (the developer implements a while(1)) with a
> watchdog throwing SIGTRAP so that our debugger gets active
> and can locate the problem according to the stack frame...

CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
to catch "well-behaving" broken threads via SIGDEBUG and kills the
hopelessly broken rest - system alive again.

You can then debug the former and need to do code review on the latter.
Or you could also try to add some loop-breaking Xenomai syscalls (or
even more clever checks) to library services the code under suspect
usually invokes.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-01 16:58     ` Jan Kiszka
@ 2010-06-02  8:36       ` Gilles Chanteperdrix
  2010-06-02  9:14         ` Jan Kiszka
  2010-06-02  9:15         ` Philippe Gerum
  0 siblings, 2 replies; 27+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-02  8:36 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Jan Kiszka wrote:
> Tschaeche IT-Services wrote:
>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
>>> Not in the absence of syscall. We thought about this once already, when
>>> considering how a watchdog preempting a runaway task in primary mode
>>> could force a secondary mode switch: there is no sane and easy solution
>>> to this unfortunately.
>> This is exactly Sigmatek's problem: Our customers develop code
>> within our debugging/development environment. We want to catch
>> this situation (the developer implements a while(1)) with a
>> watchdog throwing SIGTRAP so that our debugger gets active
>> and can locate the problem according to the stack frame...
> 
> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
> to catch "well-behaving" broken threads via SIGDEBUG and kills the
> hopelessly broken rest - system alive again.
> 
> You can then debug the former and need to do code review on the latter.
> Or you could also try to add some loop-breaking Xenomai syscalls (or
> even more clever checks) to library services the code under suspect
> usually invokes.

I am afraid "well-behaving" means emitting syscalls. We have a radical
way to cause a SIGSEGV to be sent to a thread having run amok: set its
PC to an invalid address (after having printed the real PC). gdb will
not be able to print where the program stopped, but should be able to
print the backtrace.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02  8:36       ` Gilles Chanteperdrix
@ 2010-06-02  9:14         ` Jan Kiszka
  2010-06-02  9:15         ` Philippe Gerum
  1 sibling, 0 replies; 27+ messages in thread
From: Jan Kiszka @ 2010-06-02  9:14 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Tschaeche IT-Services wrote:
>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
>>>> Not in the absence of syscall. We thought about this once already, when
>>>> considering how a watchdog preempting a runaway task in primary mode
>>>> could force a secondary mode switch: there is no sane and easy solution
>>>> to this unfortunately.
>>> This is exactly Sigmatek's problem: Our customers develop code
>>> within our debugging/development environment. We want to catch
>>> this situation (the developer implements a while(1)) with a
>>> watchdog throwing SIGTRAP so that our debugger gets active
>>> and can locate the problem according to the stack frame...
>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
>> to catch "well-behaving" broken threads via SIGDEBUG and kills the
>> hopelessly broken rest - system alive again.
>>
>> You can then debug the former and need to do code review on the latter.
>> Or you could also try to add some loop-breaking Xenomai syscalls (or
>> even more clever checks) to library services the code under suspect
>> usually invokes.
> 
> I am afraid "well-behaving" means emitting syscalls. We have a radical
> way to cause a SIGSEGV to be sent to a thread having run amok: set its
> PC to an invalid address (after having printed the real PC). gdb will
> not be able to print where the program stopped, but should be able to
> print the backtrace.

Just discussing this with our customer raised spontaneous interest (due
to the yet unsolved switching issue with non-RT Xenomai threads). I'm
going to look into this, also trying to find some more sophisticated
approaches, e.g. simulating a call to preserve the call trace (which
would make it really useful) or jumping to some helper function that
issues a syscall.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02  8:36       ` Gilles Chanteperdrix
  2010-06-02  9:14         ` Jan Kiszka
@ 2010-06-02  9:15         ` Philippe Gerum
  2010-06-02  9:20           ` Jan Kiszka
                             ` (2 more replies)
  1 sibling, 3 replies; 27+ messages in thread
From: Philippe Gerum @ 2010-06-02  9:15 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai

On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
> > Tschaeche IT-Services wrote:
> >> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
> >>> Not in the absence of syscall. We thought about this once already, when
> >>> considering how a watchdog preempting a runaway task in primary mode
> >>> could force a secondary mode switch: there is no sane and easy solution
> >>> to this unfortunately.
> >> This is exactly Sigmatek's problem: Our customers develop code
> >> within our debugging/development environment. We want to catch
> >> this situation (the developer implements a while(1)) with a
> >> watchdog throwing SIGTRAP so that our debugger gets active
> >> and can locate the problem according to the stack frame...
> > 
> > CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
> > to catch "well-behaving" broken threads via SIGDEBUG and kills the
> > hopelessly broken rest - system alive again.
> > 
> > You can then debug the former and need to do code review on the latter.
> > Or you could also try to add some loop-breaking Xenomai syscalls (or
> > even more clever checks) to library services the code under suspect
> > usually invokes.
> 
> I am afraid "well-behaving" means emitting syscalls. We have a radical
> way to cause a SIGSEGV to be sent to a thread having run amok: set its
> PC to an invalid address (after having printed the real PC). gdb will
> not be able to print where the program stopped, but should be able to
> print the backtrace.
> 

Actually, we could extend this logic and forge a stack frame to return
to the preempted application code via some userland trampoline code,
doing the switch:

[watchdog trigger]
	forge_return_frame(on =regs->sp, to =regs->pc);
	regs->pc = __oops_I_did_it_again;

__oops_I_did_it_again:
	__xn_migrate(LINUX_DOMAIN);
	ret (via forged frame)

The thing is, that this brings in some arch-dep code to forge a stack
frame (like the kernel uses for signals), that should rather live in the
pipeline core.


-- 
Philippe.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02  9:15         ` Philippe Gerum
@ 2010-06-02  9:20           ` Jan Kiszka
  2010-06-02  9:28             ` Philippe Gerum
  2010-06-02  9:21           ` Gilles Chanteperdrix
  2010-06-02 12:02           ` Daniele Nicolodi
  2 siblings, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2010-06-02  9:20 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

Philippe Gerum wrote:
> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Tschaeche IT-Services wrote:
>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
>>>>> Not in the absence of syscall. We thought about this once already, when
>>>>> considering how a watchdog preempting a runaway task in primary mode
>>>>> could force a secondary mode switch: there is no sane and easy solution
>>>>> to this unfortunately.
>>>> This is exactly Sigmatek's problem: Our customers develop code
>>>> within our debugging/development environment. We want to catch
>>>> this situation (the developer implements a while(1)) with a
>>>> watchdog throwing SIGTRAP so that our debugger gets active
>>>> and can locate the problem according to the stack frame...
>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the
>>> hopelessly broken rest - system alive again.
>>>
>>> You can then debug the former and need to do code review on the latter.
>>> Or you could also try to add some loop-breaking Xenomai syscalls (or
>>> even more clever checks) to library services the code under suspect
>>> usually invokes.
>> I am afraid "well-behaving" means emitting syscalls. We have a radical
>> way to cause a SIGSEGV to be sent to a thread having run amok: set its
>> PC to an invalid address (after having printed the real PC). gdb will
>> not be able to print where the program stopped, but should be able to
>> print the backtrace.
>>
> 
> Actually, we could extend this logic and forge a stack frame to return
> to the preempted application code via some userland trampoline code,
> doing the switch:
> 
> [watchdog trigger]
> 	forge_return_frame(on =regs->sp, to =regs->pc);
> 	regs->pc = __oops_I_did_it_again;
> 
> __oops_I_did_it_again:
> 	__xn_migrate(LINUX_DOMAIN);
> 	ret (via forged frame)

Yep, that's what came to my mind as well. But the __oops_I_did_it_again
part has to reside in user space, no?

> 
> The thing is, that this brings in some arch-dep code to forge a stack
> frame (like the kernel uses for signals), that should rather live in the
> pipeline core.

Actually, we are then close to enabling signal delivery outside syscalls...

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02  9:15         ` Philippe Gerum
  2010-06-02  9:20           ` Jan Kiszka
@ 2010-06-02  9:21           ` Gilles Chanteperdrix
  2010-06-02  9:23             ` Jan Kiszka
  2010-06-02  9:34             ` Philippe Gerum
  2010-06-02 12:02           ` Daniele Nicolodi
  2 siblings, 2 replies; 27+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-02  9:21 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, xenomai

Philippe Gerum wrote:
> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Tschaeche IT-Services wrote:
>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
>>>>> Not in the absence of syscall. We thought about this once already, when
>>>>> considering how a watchdog preempting a runaway task in primary mode
>>>>> could force a secondary mode switch: there is no sane and easy solution
>>>>> to this unfortunately.
>>>> This is exactly Sigmatek's problem: Our customers develop code
>>>> within our debugging/development environment. We want to catch
>>>> this situation (the developer implements a while(1)) with a
>>>> watchdog throwing SIGTRAP so that our debugger gets active
>>>> and can locate the problem according to the stack frame...
>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the
>>> hopelessly broken rest - system alive again.
>>>
>>> You can then debug the former and need to do code review on the latter.
>>> Or you could also try to add some loop-breaking Xenomai syscalls (or
>>> even more clever checks) to library services the code under suspect
>>> usually invokes.
>> I am afraid "well-behaving" means emitting syscalls. We have a radical
>> way to cause a SIGSEGV to be sent to a thread having run amok: set its
>> PC to an invalid address (after having printed the real PC). gdb will
>> not be able to print where the program stopped, but should be able to
>> print the backtrace.
>>
> 
> Actually, we could extend this logic and forge a stack frame to return
> to the preempted application code via some userland trampoline code,
> doing the switch:
> 
> [watchdog trigger]
> 	forge_return_frame(on =regs->sp, to =regs->pc);
> 	regs->pc = __oops_I_did_it_again;
> 
> __oops_I_did_it_again:
> 	__xn_migrate(LINUX_DOMAIN);
> 	ret (via forged frame)
> 
> The thing is, that this brings in some arch-dep code to forge a stack
> frame (like the kernel uses for signals), that should rather live in the
> pipeline core.

There seems to be a simple approach:
when the thread runs amok, set the pc to invalid address, save the real
pc somewhere
when relaxing for handling the exception (xnpod_trap_fault), if the amok
bit is set, restore the pc in the saved registers from the saved location.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02  9:21           ` Gilles Chanteperdrix
@ 2010-06-02  9:23             ` Jan Kiszka
  2010-06-02 10:19               ` Tschaeche IT-Services
  2010-06-02  9:34             ` Philippe Gerum
  1 sibling, 1 reply; 27+ messages in thread
From: Jan Kiszka @ 2010-06-02  9:23 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
>>>> Tschaeche IT-Services wrote:
>>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
>>>>>> Not in the absence of syscall. We thought about this once already, when
>>>>>> considering how a watchdog preempting a runaway task in primary mode
>>>>>> could force a secondary mode switch: there is no sane and easy solution
>>>>>> to this unfortunately.
>>>>> This is exactly Sigmatek's problem: Our customers develop code
>>>>> within our debugging/development environment. We want to catch
>>>>> this situation (the developer implements a while(1)) with a
>>>>> watchdog throwing SIGTRAP so that our debugger gets active
>>>>> and can locate the problem according to the stack frame...
>>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
>>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the
>>>> hopelessly broken rest - system alive again.
>>>>
>>>> You can then debug the former and need to do code review on the latter.
>>>> Or you could also try to add some loop-breaking Xenomai syscalls (or
>>>> even more clever checks) to library services the code under suspect
>>>> usually invokes.
>>> I am afraid "well-behaving" means emitting syscalls. We have a radical
>>> way to cause a SIGSEGV to be sent to a thread having run amok: set its
>>> PC to an invalid address (after having printed the real PC). gdb will
>>> not be able to print where the program stopped, but should be able to
>>> print the backtrace.
>>>
>> Actually, we could extend this logic and forge a stack frame to return
>> to the preempted application code via some userland trampoline code,
>> doing the switch:
>>
>> [watchdog trigger]
>> 	forge_return_frame(on =regs->sp, to =regs->pc);
>> 	regs->pc = __oops_I_did_it_again;
>>
>> __oops_I_did_it_again:
>> 	__xn_migrate(LINUX_DOMAIN);
>> 	ret (via forged frame)
>>
>> The thing is, that this brings in some arch-dep code to forge a stack
>> frame (like the kernel uses for signals), that should rather live in the
>> pipeline core.
> 
> There seems to be a simple approach:
> when the thread runs amok, set the pc to invalid address, save the real
> pc somewhere
> when relaxing for handling the exception (xnpod_trap_fault), if the amok
> bit is set, restore the pc in the saved registers from the saved location.

Sounds feasible, will give it a try.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02  9:20           ` Jan Kiszka
@ 2010-06-02  9:28             ` Philippe Gerum
  2010-06-02  9:37               ` Gilles Chanteperdrix
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2010-06-02  9:28 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Wed, 2010-06-02 at 11:20 +0200, Jan Kiszka wrote:
> Philippe Gerum wrote:
> > On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote:
> >> Jan Kiszka wrote:
> >>> Tschaeche IT-Services wrote:
> >>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
> >>>>> Not in the absence of syscall. We thought about this once already, when
> >>>>> considering how a watchdog preempting a runaway task in primary mode
> >>>>> could force a secondary mode switch: there is no sane and easy solution
> >>>>> to this unfortunately.
> >>>> This is exactly Sigmatek's problem: Our customers develop code
> >>>> within our debugging/development environment. We want to catch
> >>>> this situation (the developer implements a while(1)) with a
> >>>> watchdog throwing SIGTRAP so that our debugger gets active
> >>>> and can locate the problem according to the stack frame...
> >>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
> >>> to catch "well-behaving" broken threads via SIGDEBUG and kills the
> >>> hopelessly broken rest - system alive again.
> >>>
> >>> You can then debug the former and need to do code review on the latter.
> >>> Or you could also try to add some loop-breaking Xenomai syscalls (or
> >>> even more clever checks) to library services the code under suspect
> >>> usually invokes.
> >> I am afraid "well-behaving" means emitting syscalls. We have a radical
> >> way to cause a SIGSEGV to be sent to a thread having run amok: set its
> >> PC to an invalid address (after having printed the real PC). gdb will
> >> not be able to print where the program stopped, but should be able to
> >> print the backtrace.
> >>
> > 
> > Actually, we could extend this logic and forge a stack frame to return
> > to the preempted application code via some userland trampoline code,
> > doing the switch:
> > 
> > [watchdog trigger]
> > 	forge_return_frame(on =regs->sp, to =regs->pc);
> > 	regs->pc = __oops_I_did_it_again;
> > 
> > __oops_I_did_it_again:
> > 	__xn_migrate(LINUX_DOMAIN);
> > 	ret (via forged frame)
> 
> Yep, that's what came to my mind as well. But the __oops_I_did_it_again
> part has to reside in user space, no?

Clearly, yes. Either we map this explictly, or we just make sure to
compile it in each app, and pass its address at skin binding time. Our
text is mmlocked anyway.

> 
> > 
> > The thing is, that this brings in some arch-dep code to forge a stack
> > frame (like the kernel uses for signals), that should rather live in the
> > pipeline core.
> 
> Actually, we are then close to enabling signal delivery outside syscalls...
> 

Yes, looks like.

> Jan
> 


-- 
Philippe.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02  9:21           ` Gilles Chanteperdrix
  2010-06-02  9:23             ` Jan Kiszka
@ 2010-06-02  9:34             ` Philippe Gerum
  2010-06-02  9:43               ` Gilles Chanteperdrix
  1 sibling, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2010-06-02  9:34 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai

On Wed, 2010-06-02 at 11:21 +0200, Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
> > On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote:
> >> Jan Kiszka wrote:
> >>> Tschaeche IT-Services wrote:
> >>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
> >>>>> Not in the absence of syscall. We thought about this once already, when
> >>>>> considering how a watchdog preempting a runaway task in primary mode
> >>>>> could force a secondary mode switch: there is no sane and easy solution
> >>>>> to this unfortunately.
> >>>> This is exactly Sigmatek's problem: Our customers develop code
> >>>> within our debugging/development environment. We want to catch
> >>>> this situation (the developer implements a while(1)) with a
> >>>> watchdog throwing SIGTRAP so that our debugger gets active
> >>>> and can locate the problem according to the stack frame...
> >>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
> >>> to catch "well-behaving" broken threads via SIGDEBUG and kills the
> >>> hopelessly broken rest - system alive again.
> >>>
> >>> You can then debug the former and need to do code review on the latter.
> >>> Or you could also try to add some loop-breaking Xenomai syscalls (or
> >>> even more clever checks) to library services the code under suspect
> >>> usually invokes.
> >> I am afraid "well-behaving" means emitting syscalls. We have a radical
> >> way to cause a SIGSEGV to be sent to a thread having run amok: set its
> >> PC to an invalid address (after having printed the real PC). gdb will
> >> not be able to print where the program stopped, but should be able to
> >> print the backtrace.
> >>
> > 
> > Actually, we could extend this logic and forge a stack frame to return
> > to the preempted application code via some userland trampoline code,
> > doing the switch:
> > 
> > [watchdog trigger]
> > 	forge_return_frame(on =regs->sp, to =regs->pc);
> > 	regs->pc = __oops_I_did_it_again;
> > 
> > __oops_I_did_it_again:
> > 	__xn_migrate(LINUX_DOMAIN);
> > 	ret (via forged frame)
> > 
> > The thing is, that this brings in some arch-dep code to forge a stack
> > frame (like the kernel uses for signals), that should rather live in the
> > pipeline core.
> 
> There seems to be a simple approach:
> when the thread runs amok, set the pc to invalid address, save the real
> pc somewhere
> when relaxing for handling the exception (xnpod_trap_fault), if the amok
> bit is set, restore the pc in the saved registers from the saved location.
> 

It's indeed simpler. The limit of this approach is to count on a correct
behaviour of the fault mechanism, since we would rely on it implicitly
to deal with the mode switch. By "correct", I mean: the instruction
fetch fault must be detectable and recoverable the same way, regardless
of the architecture.


-- 
Philippe.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02  9:28             ` Philippe Gerum
@ 2010-06-02  9:37               ` Gilles Chanteperdrix
  2010-06-02 10:06                 ` Philippe Gerum
  0 siblings, 1 reply; 27+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-02  9:37 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, xenomai

Philippe Gerum wrote:
> On Wed, 2010-06-02 at 11:20 +0200, Jan Kiszka wrote:
>> Philippe Gerum wrote:
>>> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote:
>>>> Jan Kiszka wrote:
>>>>> Tschaeche IT-Services wrote:
>>>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
>>>>>>> Not in the absence of syscall. We thought about this once already, when
>>>>>>> considering how a watchdog preempting a runaway task in primary mode
>>>>>>> could force a secondary mode switch: there is no sane and easy solution
>>>>>>> to this unfortunately.
>>>>>> This is exactly Sigmatek's problem: Our customers develop code
>>>>>> within our debugging/development environment. We want to catch
>>>>>> this situation (the developer implements a while(1)) with a
>>>>>> watchdog throwing SIGTRAP so that our debugger gets active
>>>>>> and can locate the problem according to the stack frame...
>>>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
>>>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the
>>>>> hopelessly broken rest - system alive again.
>>>>>
>>>>> You can then debug the former and need to do code review on the latter.
>>>>> Or you could also try to add some loop-breaking Xenomai syscalls (or
>>>>> even more clever checks) to library services the code under suspect
>>>>> usually invokes.
>>>> I am afraid "well-behaving" means emitting syscalls. We have a radical
>>>> way to cause a SIGSEGV to be sent to a thread having run amok: set its
>>>> PC to an invalid address (after having printed the real PC). gdb will
>>>> not be able to print where the program stopped, but should be able to
>>>> print the backtrace.
>>>>
>>> Actually, we could extend this logic and forge a stack frame to return
>>> to the preempted application code via some userland trampoline code,
>>> doing the switch:
>>>
>>> [watchdog trigger]
>>> 	forge_return_frame(on =regs->sp, to =regs->pc);
>>> 	regs->pc = __oops_I_did_it_again;
>>>
>>> __oops_I_did_it_again:
>>> 	__xn_migrate(LINUX_DOMAIN);
>>> 	ret (via forged frame)
>> Yep, that's what came to my mind as well. But the __oops_I_did_it_again
>> part has to reside in user space, no?
> 
> Clearly, yes. Either we map this explictly, or we just make sure to
> compile it in each app, and pass its address at skin binding time. Our
> text is mmlocked anyway.
> 
>>> The thing is, that this brings in some arch-dep code to forge a stack
>>> frame (like the kernel uses for signals), that should rather live in the
>>> pipeline core.
>> Actually, we are then close to enabling signal delivery outside syscalls...
>>
> 
> Yes, looks like.

When thinking about this real signals things, I was thinking about
putting the forging code into Xenomai (the code is the same for all
kernel versions, so there is no reason to put it into the I-pipe, and we
may have to emit a special syscall to restore the context when handling
the signal is done). What we need the I-pipe for, however, is to trigger
some event on the way back to user-space.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02  9:34             ` Philippe Gerum
@ 2010-06-02  9:43               ` Gilles Chanteperdrix
  0 siblings, 0 replies; 27+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-02  9:43 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, xenomai

Philippe Gerum wrote:
> On Wed, 2010-06-02 at 11:21 +0200, Gilles Chanteperdrix wrote:
>> Philippe Gerum wrote:
>>> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote:
>>>> Jan Kiszka wrote:
>>>>> Tschaeche IT-Services wrote:
>>>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
>>>>>>> Not in the absence of syscall. We thought about this once already, when
>>>>>>> considering how a watchdog preempting a runaway task in primary mode
>>>>>>> could force a secondary mode switch: there is no sane and easy solution
>>>>>>> to this unfortunately.
>>>>>> This is exactly Sigmatek's problem: Our customers develop code
>>>>>> within our debugging/development environment. We want to catch
>>>>>> this situation (the developer implements a while(1)) with a
>>>>>> watchdog throwing SIGTRAP so that our debugger gets active
>>>>>> and can locate the problem according to the stack frame...
>>>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
>>>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the
>>>>> hopelessly broken rest - system alive again.
>>>>>
>>>>> You can then debug the former and need to do code review on the latter.
>>>>> Or you could also try to add some loop-breaking Xenomai syscalls (or
>>>>> even more clever checks) to library services the code under suspect
>>>>> usually invokes.
>>>> I am afraid "well-behaving" means emitting syscalls. We have a radical
>>>> way to cause a SIGSEGV to be sent to a thread having run amok: set its
>>>> PC to an invalid address (after having printed the real PC). gdb will
>>>> not be able to print where the program stopped, but should be able to
>>>> print the backtrace.
>>>>
>>> Actually, we could extend this logic and forge a stack frame to return
>>> to the preempted application code via some userland trampoline code,
>>> doing the switch:
>>>
>>> [watchdog trigger]
>>> 	forge_return_frame(on =regs->sp, to =regs->pc);
>>> 	regs->pc = __oops_I_did_it_again;
>>>
>>> __oops_I_did_it_again:
>>> 	__xn_migrate(LINUX_DOMAIN);
>>> 	ret (via forged frame)
>>>
>>> The thing is, that this brings in some arch-dep code to forge a stack
>>> frame (like the kernel uses for signals), that should rather live in the
>>> pipeline core.
>> There seems to be a simple approach:
>> when the thread runs amok, set the pc to invalid address, save the real
>> pc somewhere
>> when relaxing for handling the exception (xnpod_trap_fault), if the amok
>> bit is set, restore the pc in the saved registers from the saved location.
>>
> 
> It's indeed simpler. The limit of this approach is to count on a correct
> behaviour of the fault mechanism, since we would rely on it implicitly
> to deal with the mode switch. By "correct", I mean: the instruction
> fetch fault must be detectable and recoverable the same way, regardless
> of the architecture.

Yes, if the kernel looks at what is under the PC to handle the fault, we
are toast because it will probably do it after we have restored the real PC.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02  9:37               ` Gilles Chanteperdrix
@ 2010-06-02 10:06                 ` Philippe Gerum
  2010-06-02 10:19                   ` Gilles Chanteperdrix
  2010-06-02 10:29                   ` Gilles Chanteperdrix
  0 siblings, 2 replies; 27+ messages in thread
From: Philippe Gerum @ 2010-06-02 10:06 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai

On Wed, 2010-06-02 at 11:37 +0200, Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
> > On Wed, 2010-06-02 at 11:20 +0200, Jan Kiszka wrote:
> >> Philippe Gerum wrote:
> >>> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote:
> >>>> Jan Kiszka wrote:
> >>>>> Tschaeche IT-Services wrote:
> >>>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
> >>>>>>> Not in the absence of syscall. We thought about this once already, when
> >>>>>>> considering how a watchdog preempting a runaway task in primary mode
> >>>>>>> could force a secondary mode switch: there is no sane and easy solution
> >>>>>>> to this unfortunately.
> >>>>>> This is exactly Sigmatek's problem: Our customers develop code
> >>>>>> within our debugging/development environment. We want to catch
> >>>>>> this situation (the developer implements a while(1)) with a
> >>>>>> watchdog throwing SIGTRAP so that our debugger gets active
> >>>>>> and can locate the problem according to the stack frame...
> >>>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
> >>>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the
> >>>>> hopelessly broken rest - system alive again.
> >>>>>
> >>>>> You can then debug the former and need to do code review on the latter.
> >>>>> Or you could also try to add some loop-breaking Xenomai syscalls (or
> >>>>> even more clever checks) to library services the code under suspect
> >>>>> usually invokes.
> >>>> I am afraid "well-behaving" means emitting syscalls. We have a radical
> >>>> way to cause a SIGSEGV to be sent to a thread having run amok: set its
> >>>> PC to an invalid address (after having printed the real PC). gdb will
> >>>> not be able to print where the program stopped, but should be able to
> >>>> print the backtrace.
> >>>>
> >>> Actually, we could extend this logic and forge a stack frame to return
> >>> to the preempted application code via some userland trampoline code,
> >>> doing the switch:
> >>>
> >>> [watchdog trigger]
> >>> 	forge_return_frame(on =regs->sp, to =regs->pc);
> >>> 	regs->pc = __oops_I_did_it_again;
> >>>
> >>> __oops_I_did_it_again:
> >>> 	__xn_migrate(LINUX_DOMAIN);
> >>> 	ret (via forged frame)
> >> Yep, that's what came to my mind as well. But the __oops_I_did_it_again
> >> part has to reside in user space, no?
> > 
> > Clearly, yes. Either we map this explictly, or we just make sure to
> > compile it in each app, and pass its address at skin binding time. Our
> > text is mmlocked anyway.
> > 
> >>> The thing is, that this brings in some arch-dep code to forge a stack
> >>> frame (like the kernel uses for signals), that should rather live in the
> >>> pipeline core.
> >> Actually, we are then close to enabling signal delivery outside syscalls...
> >>
> > 
> > Yes, looks like.
> 
> When thinking about this real signals things, I was thinking about
> putting the forging code into Xenomai (the code is the same for all
> kernel versions, so there is no reason to put it into the I-pipe, and we
> may have to emit a special syscall to restore the context when handling
> the signal is done). What we need the I-pipe for, however, is to trigger
> some event on the way back to user-space.
> 

A reason to have this code in the pipeline core is because we would
duplicate the setup_rt_frame code already available from the vanilla
kernel. It's a bit like xnarch_switch_to: we used to open code most of
it in our arch-dep code, mostly duplicating the vanilla switch code, but
having switch_mm() ironed enough - on arm and powerpc at least - to be
callable from the Xenomai domain as well proved to be a serious relief.

Granted, the signal code is unlikely to change a lot, given the strong
ABI requirements this has wrt the glibc, but I'm always reluctant to
introduce duplicates at both ends of the system; I would rather factor
out that code and make it available to both domains, if that makes
sense.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02  9:23             ` Jan Kiszka
@ 2010-06-02 10:19               ` Tschaeche IT-Services
  2010-06-02 10:48                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 27+ messages in thread
From: Tschaeche IT-Services @ 2010-06-02 10:19 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Wed, Jun 02, 2010 at 11:23:51AM +0200, Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
> > Philippe Gerum wrote:
> >> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote:
> >>> Jan Kiszka wrote:
> >>>> Tschaeche IT-Services wrote:
> >>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
> >>>>>> Not in the absence of syscall. We thought about this once already, when
> >>>>>> considering how a watchdog preempting a runaway task in primary mode
> >>>>>> could force a secondary mode switch: there is no sane and easy solution
> >>>>>> to this unfortunately.
> >>>>> This is exactly Sigmatek's problem: Our customers develop code
> >>>>> within our debugging/development environment. We want to catch
> >>>>> this situation (the developer implements a while(1)) with a
> >>>>> watchdog throwing SIGTRAP so that our debugger gets active
> >>>>> and can locate the problem according to the stack frame...
> >>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
> >>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the
> >>>> hopelessly broken rest - system alive again.
> >>>>
> >>>> You can then debug the former and need to do code review on the latter.
> >>>> Or you could also try to add some loop-breaking Xenomai syscalls (or
> >>>> even more clever checks) to library services the code under suspect
> >>>> usually invokes.
> >>> I am afraid "well-behaving" means emitting syscalls. We have a radical
> >>> way to cause a SIGSEGV to be sent to a thread having run amok: set its
> >>> PC to an invalid address (after having printed the real PC). gdb will
> >>> not be able to print where the program stopped, but should be able to
> >>> print the backtrace.
> >>>
> >> Actually, we could extend this logic and forge a stack frame to return
> >> to the preempted application code via some userland trampoline code,
> >> doing the switch:
> >>
> >> [watchdog trigger]
> >> 	forge_return_frame(on =regs->sp, to =regs->pc);
> >> 	regs->pc = __oops_I_did_it_again;
> >>
> >> __oops_I_did_it_again:
> >> 	__xn_migrate(LINUX_DOMAIN);
> >> 	ret (via forged frame)
> >>
> >> The thing is, that this brings in some arch-dep code to forge a stack
> >> frame (like the kernel uses for signals), that should rather live in the
> >> pipeline core.
> > 
> > There seems to be a simple approach:
> > when the thread runs amok, set the pc to invalid address, save the real
> > pc somewhere
> > when relaxing for handling the exception (xnpod_trap_fault), if the amok
> > bit is set, restore the pc in the saved registers from the saved location.
> 
> Sounds feasible, will give it a try.

Looking at your discussion, handling asynchronous Linux signals in a primary domain task
is not a "must" (but would be nice) for Xenomai according to initiate the signal handling
in secondary domain *immediately*.

Another solution might be, checking the state of the AMOK-task when Xenomai
schedules the task for execution. If Linux-Signals are pending, force secondary
domain switch. Thus, asynchronous Linux signals are handled at latest on
primary domain scheduler activities - which would be sufficient for us...

Regards,

	Olli

-- 
Tschaeche IT-Services       Tel.:  +49/9134/9089850
Dr.-Ing. Oliver Tschäche    Mobil: +49/176/20435601
Welluckenweg 4              Email: services@domain.hid
91077 Neunkirchen


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02 10:06                 ` Philippe Gerum
@ 2010-06-02 10:19                   ` Gilles Chanteperdrix
  2010-06-02 10:42                     ` Philippe Gerum
  2010-06-02 10:29                   ` Gilles Chanteperdrix
  1 sibling, 1 reply; 27+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-02 10:19 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, xenomai

Philippe Gerum wrote:
> On Wed, 2010-06-02 at 11:37 +0200, Gilles Chanteperdrix wrote:
>> Philippe Gerum wrote:
>>> On Wed, 2010-06-02 at 11:20 +0200, Jan Kiszka wrote:
>>>> Philippe Gerum wrote:
>>>>> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote:
>>>>>> Jan Kiszka wrote:
>>>>>>> Tschaeche IT-Services wrote:
>>>>>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
>>>>>>>>> Not in the absence of syscall. We thought about this once already, when
>>>>>>>>> considering how a watchdog preempting a runaway task in primary mode
>>>>>>>>> could force a secondary mode switch: there is no sane and easy solution
>>>>>>>>> to this unfortunately.
>>>>>>>> This is exactly Sigmatek's problem: Our customers develop code
>>>>>>>> within our debugging/development environment. We want to catch
>>>>>>>> this situation (the developer implements a while(1)) with a
>>>>>>>> watchdog throwing SIGTRAP so that our debugger gets active
>>>>>>>> and can locate the problem according to the stack frame...
>>>>>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
>>>>>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the
>>>>>>> hopelessly broken rest - system alive again.
>>>>>>>
>>>>>>> You can then debug the former and need to do code review on the latter.
>>>>>>> Or you could also try to add some loop-breaking Xenomai syscalls (or
>>>>>>> even more clever checks) to library services the code under suspect
>>>>>>> usually invokes.
>>>>>> I am afraid "well-behaving" means emitting syscalls. We have a radical
>>>>>> way to cause a SIGSEGV to be sent to a thread having run amok: set its
>>>>>> PC to an invalid address (after having printed the real PC). gdb will
>>>>>> not be able to print where the program stopped, but should be able to
>>>>>> print the backtrace.
>>>>>>
>>>>> Actually, we could extend this logic and forge a stack frame to return
>>>>> to the preempted application code via some userland trampoline code,
>>>>> doing the switch:
>>>>>
>>>>> [watchdog trigger]
>>>>> 	forge_return_frame(on =regs->sp, to =regs->pc);
>>>>> 	regs->pc = __oops_I_did_it_again;
>>>>>
>>>>> __oops_I_did_it_again:
>>>>> 	__xn_migrate(LINUX_DOMAIN);
>>>>> 	ret (via forged frame)
>>>> Yep, that's what came to my mind as well. But the __oops_I_did_it_again
>>>> part has to reside in user space, no?
>>> Clearly, yes. Either we map this explictly, or we just make sure to
>>> compile it in each app, and pass its address at skin binding time. Our
>>> text is mmlocked anyway.
>>>
>>>>> The thing is, that this brings in some arch-dep code to forge a stack
>>>>> frame (like the kernel uses for signals), that should rather live in the
>>>>> pipeline core.
>>>> Actually, we are then close to enabling signal delivery outside syscalls...
>>>>
>>> Yes, looks like.
>> When thinking about this real signals things, I was thinking about
>> putting the forging code into Xenomai (the code is the same for all
>> kernel versions, so there is no reason to put it into the I-pipe, and we
>> may have to emit a special syscall to restore the context when handling
>> the signal is done). What we need the I-pipe for, however, is to trigger
>> some event on the way back to user-space.
>>
> 
> A reason to have this code in the pipeline core is because we would
> duplicate the setup_rt_frame code already available from the vanilla
> kernel. It's a bit like xnarch_switch_to: we used to open code most of
> it in our arch-dep code, mostly duplicating the vanilla switch code, but
> having switch_mm() ironed enough - on arm and powerpc at least - to be
> callable from the Xenomai domain as well proved to be a serious relief.
> 
> Granted, the signal code is unlikely to change a lot, given the strong
> ABI requirements this has wrt the glibc, but I'm always reluctant to
> introduce duplicates at both ends of the system; I would rather factor
> out that code and make it available to both domains, if that makes
> sense.

I am not sure it really makes sense: the biggest part of the linux code
is used to setup the special frame passed as the last void * pointer of
signal handlers with the SA_SIGINFO option, allowing (among others)
signal handlers to use setcontext() to implement co-routines, and I am
not sure we really want that. And if you do some major revamping of
Linux stack frame build functions, you will have merge conflicts every
time you upgrade the I-pipe patch.

Besides, we still have the return through syscall issue: returning from
the signal handler can not be a simple "return" instruction, since we
have to save and restore most registers.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02 10:06                 ` Philippe Gerum
  2010-06-02 10:19                   ` Gilles Chanteperdrix
@ 2010-06-02 10:29                   ` Gilles Chanteperdrix
  1 sibling, 0 replies; 27+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-02 10:29 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, xenomai

Philippe Gerum wrote:
> On Wed, 2010-06-02 at 11:37 +0200, Gilles Chanteperdrix wrote:
>> Philippe Gerum wrote:
>>> On Wed, 2010-06-02 at 11:20 +0200, Jan Kiszka wrote:
>>>> Philippe Gerum wrote:
>>>>> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote:
>>>>>> Jan Kiszka wrote:
>>>>>>> Tschaeche IT-Services wrote:
>>>>>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
>>>>>>>>> Not in the absence of syscall. We thought about this once already, when
>>>>>>>>> considering how a watchdog preempting a runaway task in primary mode
>>>>>>>>> could force a secondary mode switch: there is no sane and easy solution
>>>>>>>>> to this unfortunately.
>>>>>>>> This is exactly Sigmatek's problem: Our customers develop code
>>>>>>>> within our debugging/development environment. We want to catch
>>>>>>>> this situation (the developer implements a while(1)) with a
>>>>>>>> watchdog throwing SIGTRAP so that our debugger gets active
>>>>>>>> and can locate the problem according to the stack frame...
>>>>>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
>>>>>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the
>>>>>>> hopelessly broken rest - system alive again.
>>>>>>>
>>>>>>> You can then debug the former and need to do code review on the latter.
>>>>>>> Or you could also try to add some loop-breaking Xenomai syscalls (or
>>>>>>> even more clever checks) to library services the code under suspect
>>>>>>> usually invokes.
>>>>>> I am afraid "well-behaving" means emitting syscalls. We have a radical
>>>>>> way to cause a SIGSEGV to be sent to a thread having run amok: set its
>>>>>> PC to an invalid address (after having printed the real PC). gdb will
>>>>>> not be able to print where the program stopped, but should be able to
>>>>>> print the backtrace.
>>>>>>
>>>>> Actually, we could extend this logic and forge a stack frame to return
>>>>> to the preempted application code via some userland trampoline code,
>>>>> doing the switch:
>>>>>
>>>>> [watchdog trigger]
>>>>> 	forge_return_frame(on =regs->sp, to =regs->pc);
>>>>> 	regs->pc = __oops_I_did_it_again;
>>>>>
>>>>> __oops_I_did_it_again:
>>>>> 	__xn_migrate(LINUX_DOMAIN);
>>>>> 	ret (via forged frame)
>>>> Yep, that's what came to my mind as well. But the __oops_I_did_it_again
>>>> part has to reside in user space, no?
>>> Clearly, yes. Either we map this explictly, or we just make sure to
>>> compile it in each app, and pass its address at skin binding time. Our
>>> text is mmlocked anyway.
>>>
>>>>> The thing is, that this brings in some arch-dep code to forge a stack
>>>>> frame (like the kernel uses for signals), that should rather live in the
>>>>> pipeline core.
>>>> Actually, we are then close to enabling signal delivery outside syscalls...
>>>>
>>> Yes, looks like.
>> When thinking about this real signals things, I was thinking about
>> putting the forging code into Xenomai (the code is the same for all
>> kernel versions, so there is no reason to put it into the I-pipe, and we
>> may have to emit a special syscall to restore the context when handling
>> the signal is done). What we need the I-pipe for, however, is to trigger
>> some event on the way back to user-space.
>>
> 
> A reason to have this code in the pipeline core is because we would
> duplicate the setup_rt_frame code already available from the vanilla
> kernel. It's a bit like xnarch_switch_to: we used to open code most of
> it in our arch-dep code, mostly duplicating the vanilla switch code, but
> having switch_mm() ironed enough - on arm and powerpc at least - to be
> callable from the Xenomai domain as well proved to be a serious relief.
> 
> Granted, the signal code is unlikely to change a lot, given the strong
> ABI requirements this has wrt the glibc, but I'm always reluctant to
> introduce duplicates at both ends of the system; I would rather factor
> out that code and make it available to both domains, if that makes
> sense.

I even had written some piece of code for x86 (completely untested).

#include <asm/ptrace.h>

#define __FIX_EFLAGS	(X86_EFLAGS_AC | X86_EFLAGS_OF | \
			 X86_EFLAGS_DF | X86_EFLAGS_TF | X86_EFLAGS_SF | \
			 X86_EFLAGS_ZF | X86_EFLAGS_AF | X86_EFLAGS_PF | \
			 X86_EFLAGS_CF)

#ifdef CONFIG_X86_32
# define FIX_EFLAGS	(__FIX_EFLAGS | X86_EFLAGS_RF)
#else
# define FIX_EFLAGS	__FIX_EFLAGS
#endif

#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 11)
#define hal_fpu_init_p(task)   ((task)->used_math)
#define hal_set_fpu_init(task) ((task)->used_math = 1)
#else
#define hal_fpu_init_p(task)   tsk_used_math(task)
#define hal_set_fpu_init(task) set_stopped_child_used_math(task)
#endif

void __user *hal_push(struct pt_regs *regs, void *chunk, size_t size)
{
	unsigned long sp = regs->sp;

	sp -= size;
	if (__xn_copy_to_user((void __user *)sp, chunk, size))
		return ERR_PTR(-EFAULT);
	
	regs->sp = sp;

	return (void __user *)sp;
}

#ifdef CONFIG_X86_32
struct sigtest_sigframe {
	u32 pretcoder;
	void *arg1;
	void *arg2;
	void __user *math;
	struct pt_regs regs;
};

static unsigned long align_sigframe(unsigned long sp)
{
	return ((sp + 4) & -16ul) - 4;
}

void hal_save_fpu(x86_fpustate *fpup)
{
	if (cpu_has_fxsr)
		__asm__ __volatile__("fxsave %0; fnclex":"=m"(*fpup));
	else
		__asm__ __volatile__("fnsave %0; fwait":"=m"(*fpup));
}

void hal_restore_fpu(x86_fpustate *fpup)
{
	clts();

	if (cpu_has_fxsr)
		__asm__ __volatile__("fxrstor %0": /* no output */ :"m"(*fpup));
	else
		__asm__ __volatile__("frstor %0": /* no output */ :"m"(*fpup));
}

void hal_init_fpu(void)
{
	__asm__ __volatile__("clts; fninit");

	if (cpu_has_xmm) {
		unsigned long __mxcsr = 0x1f80UL & 0xffbfUL;
		__asm__ __volatile__("ldmxcsr %0"::"m"(__mxcsr));
	}
}

int hal_trigger_cb(struct pt_regs *regs, void *fpup,
		   void __user *cb, void __user *ret, void *arg1, void *arg2)
{
	struct sigtest_sigframe __user *frame;
	unsigned long sp = regs->sp;
	unsigned long flags;

	local_irq_save_hw(flags);
	if (wrap_test_fpu_used(current) || hal_fpu_init_p(current)) {
		if (wrap_test_fpu_used(current)) {
			hal_save_fpu(fpup);
			wrap_clear_fpu_used(current);
		}
		if (__xn_copy_to_user((void __user *)sp, fpup, sizeof(*fpup))) {
			local_irq_restore_hw(flags);
			return -EFAULT;
		}
		k_frame->math = (void __user *)sp;
	} else
		k_frame->math = NULL;
	local_irq_restore_hw(flags);

	sp = align_sigframe(sp - sizeof(*frame));

	frame = (struct sigtest_sigframe __user *)sp;

	k_frame->pretcoder = ret;
	k_frame->arg1 = arg1;
	k_frame->arg2 = arg2;

	if (__xn_copy_to_user(frame, k_frame, 
			      offsetof(struct sigtest_sigframe, regs)))
		return -EFAULT;

	if (__xn_copy_to_user(&frame->regs, regs, sizeof(*regs)))
		return -EFAULT;
	
	regs->sp = sp;
	regs->ip = (unsigned long)cb;
	regs->ax = (unsigned long)arg1;
	regs->dx = (unsigned long)arg2;
	regs->cx = 0;
	
	regs->ds = __USER_DS;
	regs->es = __USER_DS;
	regs->ss = __USER_DS;
	regs->cs = __USER_CS;

	return 0;
}

int hal_restore_regs(struct pt_regs *regs, void *fpup)
{
	struct sigtest_sigframe __user *frame;
	unsigned long orig_flags;
	unsigned long flags;
	void __user *math;

	frame = (struct sigtest_sigframe __user *)(regs->sp - 8);

	orig_flags = regs->flags;

	if (__xn_copy_from_user(&math, &frame->math, sizeof(math)))
		return -EFAULT;
	if (__xn_copy_from_user(regs, &frame->regs, sizeof(*regs)))
		return -EFAULT;

	set_user_gs(regs, regs->gs);
	regs->cs |= 3;
	regs->ss |= 3;
	regs->flags = (orig_flags & ~FIX_EFLAGS) | (regs->flags & FIX_EFLAGS);

	local_irq_save_hw(flags);
	if (math) {
		if (__xn_copy_from_user(fpup, math, sizeof(*fpup))) {
			local_irq_restore_hw(flags);
			return -EFAULT;
		}
		hal_restore_fpu(fpup);
	} else if (hal_fpu_init_p(current)) {
		/* sighandler used fpu, restore the init state. */
		hal_init_fpu();
		wrap_set_fpu_used(current);
	}
	local_irq_restore_hw(flags);
}
#else /* CONFIG_X86_64 */
#endif /* CONFIG_X86_64 */


> 


-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02 10:19                   ` Gilles Chanteperdrix
@ 2010-06-02 10:42                     ` Philippe Gerum
  2010-06-02 10:51                       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Gerum @ 2010-06-02 10:42 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai

On Wed, 2010-06-02 at 12:19 +0200, Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
> > On Wed, 2010-06-02 at 11:37 +0200, Gilles Chanteperdrix wrote:
> >> Philippe Gerum wrote:
> >>> On Wed, 2010-06-02 at 11:20 +0200, Jan Kiszka wrote:
> >>>> Philippe Gerum wrote:
> >>>>> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote:
> >>>>>> Jan Kiszka wrote:
> >>>>>>> Tschaeche IT-Services wrote:
> >>>>>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
> >>>>>>>>> Not in the absence of syscall. We thought about this once already, when
> >>>>>>>>> considering how a watchdog preempting a runaway task in primary mode
> >>>>>>>>> could force a secondary mode switch: there is no sane and easy solution
> >>>>>>>>> to this unfortunately.
> >>>>>>>> This is exactly Sigmatek's problem: Our customers develop code
> >>>>>>>> within our debugging/development environment. We want to catch
> >>>>>>>> this situation (the developer implements a while(1)) with a
> >>>>>>>> watchdog throwing SIGTRAP so that our debugger gets active
> >>>>>>>> and can locate the problem according to the stack frame...
> >>>>>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
> >>>>>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the
> >>>>>>> hopelessly broken rest - system alive again.
> >>>>>>>
> >>>>>>> You can then debug the former and need to do code review on the latter.
> >>>>>>> Or you could also try to add some loop-breaking Xenomai syscalls (or
> >>>>>>> even more clever checks) to library services the code under suspect
> >>>>>>> usually invokes.
> >>>>>> I am afraid "well-behaving" means emitting syscalls. We have a radical
> >>>>>> way to cause a SIGSEGV to be sent to a thread having run amok: set its
> >>>>>> PC to an invalid address (after having printed the real PC). gdb will
> >>>>>> not be able to print where the program stopped, but should be able to
> >>>>>> print the backtrace.
> >>>>>>
> >>>>> Actually, we could extend this logic and forge a stack frame to return
> >>>>> to the preempted application code via some userland trampoline code,
> >>>>> doing the switch:
> >>>>>
> >>>>> [watchdog trigger]
> >>>>> 	forge_return_frame(on =regs->sp, to =regs->pc);
> >>>>> 	regs->pc = __oops_I_did_it_again;
> >>>>>
> >>>>> __oops_I_did_it_again:
> >>>>> 	__xn_migrate(LINUX_DOMAIN);
> >>>>> 	ret (via forged frame)
> >>>> Yep, that's what came to my mind as well. But the __oops_I_did_it_again
> >>>> part has to reside in user space, no?
> >>> Clearly, yes. Either we map this explictly, or we just make sure to
> >>> compile it in each app, and pass its address at skin binding time. Our
> >>> text is mmlocked anyway.
> >>>
> >>>>> The thing is, that this brings in some arch-dep code to forge a stack
> >>>>> frame (like the kernel uses for signals), that should rather live in the
> >>>>> pipeline core.
> >>>> Actually, we are then close to enabling signal delivery outside syscalls...
> >>>>
> >>> Yes, looks like.
> >> When thinking about this real signals things, I was thinking about
> >> putting the forging code into Xenomai (the code is the same for all
> >> kernel versions, so there is no reason to put it into the I-pipe, and we
> >> may have to emit a special syscall to restore the context when handling
> >> the signal is done). What we need the I-pipe for, however, is to trigger
> >> some event on the way back to user-space.
> >>
> > 
> > A reason to have this code in the pipeline core is because we would
> > duplicate the setup_rt_frame code already available from the vanilla
> > kernel. It's a bit like xnarch_switch_to: we used to open code most of
> > it in our arch-dep code, mostly duplicating the vanilla switch code, but
> > having switch_mm() ironed enough - on arm and powerpc at least - to be
> > callable from the Xenomai domain as well proved to be a serious relief.
> > 
> > Granted, the signal code is unlikely to change a lot, given the strong
> > ABI requirements this has wrt the glibc, but I'm always reluctant to
> > introduce duplicates at both ends of the system; I would rather factor
> > out that code and make it available to both domains, if that makes
> > sense.
> 
> I am not sure it really makes sense: the biggest part of the linux code
> is used to setup the special frame passed as the last void * pointer of
> signal handlers with the SA_SIGINFO option, allowing (among others)
> signal handlers to use setcontext() to implement co-routines, and I am
> not sure we really want that. 

It's not about wanting that, it is about having it for free despite we
would not use it.

> And if you do some major revamping of
> Linux stack frame build functions, you will have merge conflicts every
> time you upgrade the I-pipe patch.
> 

I don't think so, for the same reason than you suspect that the kernel
code does not change ever so often in that area.

> Besides, we still have the return through syscall issue: returning from
> the signal handler can not be a simple "return" instruction, since we
> have to save and restore most registers.
> 

Sure, but this is not related to the place where you would put the
forging code. You may have a Xenomai syscall invoking a pipeline
service, we do that all the time actually.

Anyway, this issue is not critical to me. If you can achieve that goal
in plain Xenomai space without ending up with a two pages long hairy
code for each arch, then I won't not be pigheaded.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02 10:19               ` Tschaeche IT-Services
@ 2010-06-02 10:48                 ` Gilles Chanteperdrix
  0 siblings, 0 replies; 27+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-02 10:48 UTC (permalink / raw)
  To: Tschaeche IT-Services; +Cc: Jan Kiszka, xenomai

Tschaeche IT-Services wrote:
> Looking at your discussion, handling asynchronous Linux signals in a primary domain task
> is not a "must" (but would be nice) for Xenomai according to initiate the signal handling
> in secondary domain *immediately*.
> 
> Another solution might be, checking the state of the AMOK-task when Xenomai
> schedules the task for execution. If Linux-Signals are pending, force secondary
> domain switch. Thus, asynchronous Linux signals are handled at latest on
> primary domain scheduler activities - which would be sufficient for us...

As Philippe explained to you in the second answer you received to your
initial mail, that is impossible, because the function migrating threads
from primary to secondary mode can not be called at any time. This issue
is bugging us for some time, if that had worked, we would have
implemented it a long time ago.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02 10:42                     ` Philippe Gerum
@ 2010-06-02 10:51                       ` Gilles Chanteperdrix
  0 siblings, 0 replies; 27+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-02 10:51 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, xenomai

Philippe Gerum wrote:
> On Wed, 2010-06-02 at 12:19 +0200, Gilles Chanteperdrix wrote:
>> Philippe Gerum wrote:
>>> On Wed, 2010-06-02 at 11:37 +0200, Gilles Chanteperdrix wrote:
>>>> Philippe Gerum wrote:
>>>>> On Wed, 2010-06-02 at 11:20 +0200, Jan Kiszka wrote:
>>>>>> Philippe Gerum wrote:
>>>>>>> On Wed, 2010-06-02 at 10:36 +0200, Gilles Chanteperdrix wrote:
>>>>>>>> Jan Kiszka wrote:
>>>>>>>>> Tschaeche IT-Services wrote:
>>>>>>>>>> On Tue, Jun 01, 2010 at 04:32:37PM +0200, Philippe Gerum wrote:
>>>>>>>>>>> Not in the absence of syscall. We thought about this once already, when
>>>>>>>>>>> considering how a watchdog preempting a runaway task in primary mode
>>>>>>>>>>> could force a secondary mode switch: there is no sane and easy solution
>>>>>>>>>>> to this unfortunately.
>>>>>>>>>> This is exactly Sigmatek's problem: Our customers develop code
>>>>>>>>>> within our debugging/development environment. We want to catch
>>>>>>>>>> this situation (the developer implements a while(1)) with a
>>>>>>>>>> watchdog throwing SIGTRAP so that our debugger gets active
>>>>>>>>>> and can locate the problem according to the stack frame...
>>>>>>>>> CONFIG_XENO_OPT_WATCHDOG is probably what you are looking for. It tries
>>>>>>>>> to catch "well-behaving" broken threads via SIGDEBUG and kills the
>>>>>>>>> hopelessly broken rest - system alive again.
>>>>>>>>>
>>>>>>>>> You can then debug the former and need to do code review on the latter.
>>>>>>>>> Or you could also try to add some loop-breaking Xenomai syscalls (or
>>>>>>>>> even more clever checks) to library services the code under suspect
>>>>>>>>> usually invokes.
>>>>>>>> I am afraid "well-behaving" means emitting syscalls. We have a radical
>>>>>>>> way to cause a SIGSEGV to be sent to a thread having run amok: set its
>>>>>>>> PC to an invalid address (after having printed the real PC). gdb will
>>>>>>>> not be able to print where the program stopped, but should be able to
>>>>>>>> print the backtrace.
>>>>>>>>
>>>>>>> Actually, we could extend this logic and forge a stack frame to return
>>>>>>> to the preempted application code via some userland trampoline code,
>>>>>>> doing the switch:
>>>>>>>
>>>>>>> [watchdog trigger]
>>>>>>> 	forge_return_frame(on =regs->sp, to =regs->pc);
>>>>>>> 	regs->pc = __oops_I_did_it_again;
>>>>>>>
>>>>>>> __oops_I_did_it_again:
>>>>>>> 	__xn_migrate(LINUX_DOMAIN);
>>>>>>> 	ret (via forged frame)
>>>>>> Yep, that's what came to my mind as well. But the __oops_I_did_it_again
>>>>>> part has to reside in user space, no?
>>>>> Clearly, yes. Either we map this explictly, or we just make sure to
>>>>> compile it in each app, and pass its address at skin binding time. Our
>>>>> text is mmlocked anyway.
>>>>>
>>>>>>> The thing is, that this brings in some arch-dep code to forge a stack
>>>>>>> frame (like the kernel uses for signals), that should rather live in the
>>>>>>> pipeline core.
>>>>>> Actually, we are then close to enabling signal delivery outside syscalls...
>>>>>>
>>>>> Yes, looks like.
>>>> When thinking about this real signals things, I was thinking about
>>>> putting the forging code into Xenomai (the code is the same for all
>>>> kernel versions, so there is no reason to put it into the I-pipe, and we
>>>> may have to emit a special syscall to restore the context when handling
>>>> the signal is done). What we need the I-pipe for, however, is to trigger
>>>> some event on the way back to user-space.
>>>>
>>> A reason to have this code in the pipeline core is because we would
>>> duplicate the setup_rt_frame code already available from the vanilla
>>> kernel. It's a bit like xnarch_switch_to: we used to open code most of
>>> it in our arch-dep code, mostly duplicating the vanilla switch code, but
>>> having switch_mm() ironed enough - on arm and powerpc at least - to be
>>> callable from the Xenomai domain as well proved to be a serious relief.
>>>
>>> Granted, the signal code is unlikely to change a lot, given the strong
>>> ABI requirements this has wrt the glibc, but I'm always reluctant to
>>> introduce duplicates at both ends of the system; I would rather factor
>>> out that code and make it available to both domains, if that makes
>>> sense.
>> I am not sure it really makes sense: the biggest part of the linux code
>> is used to setup the special frame passed as the last void * pointer of
>> signal handlers with the SA_SIGINFO option, allowing (among others)
>> signal handlers to use setcontext() to implement co-routines, and I am
>> not sure we really want that. 
> 
> It's not about wanting that, it is about having it for free despite we
> would not use it.
> 
>> And if you do some major revamping of
>> Linux stack frame build functions, you will have merge conflicts every
>> time you upgrade the I-pipe patch.
>>
> 
> I don't think so, for the same reason than you suspect that the kernel
> code does not change ever so often in that area.
> 
>> Besides, we still have the return through syscall issue: returning from
>> the signal handler can not be a simple "return" instruction, since we
>> have to save and restore most registers.
>>
> 
> Sure, but this is not related to the place where you would put the
> forging code. You may have a Xenomai syscall invoking a pipeline
> service, we do that all the time actually.

Yes, OK. We can do this by implementing a trampoline for signals in
user-space.

> 
> Anyway, this issue is not critical to me. If you can achieve that goal
> in plain Xenomai space without ending up with a two pages long hairy
> code for each arch, then I won't not be pigheaded.

I have posted what the code would look like from my point of view. It
does look pretty simple and linear to me, though is two pages long.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02  9:15         ` Philippe Gerum
  2010-06-02  9:20           ` Jan Kiszka
  2010-06-02  9:21           ` Gilles Chanteperdrix
@ 2010-06-02 12:02           ` Daniele Nicolodi
  2010-06-02 13:47             ` Gilles Chanteperdrix
  2010-06-02 15:14             ` Philippe Gerum
  2 siblings, 2 replies; 27+ messages in thread
From: Daniele Nicolodi @ 2010-06-02 12:02 UTC (permalink / raw)
  To: xenomai

On 02/06/10 11:15, Philippe Gerum wrote:

> Actually, we could extend this logic and forge a stack frame to return
> to the preempted application code via some userland trampoline code,
> doing the switch:
> 
> [watchdog trigger]
> 	forge_return_frame(on =regs->sp, to =regs->pc);
> 	regs->pc = __oops_I_did_it_again;
> 
> __oops_I_did_it_again:
> 	__xn_migrate(LINUX_DOMAIN);
> 	ret (via forged frame)
> 
> The thing is, that this brings in some arch-dep code to forge a stack
> frame (like the kernel uses for signals), that should rather live in the
> pipeline core.

Am I to naive thinking that this solution would let the user space
choose what to do when the watchdog interrupts the current thread? In
your example, it would be enough to assign to __ops_I_did_it_again a
function pointer to the function that has to be executed.

Probably there will be hard constraint on what this function can do, but
it would be a nice feature for debugging and for solving application
specific issues.

Cheers,
-- 
Daniele


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02 12:02           ` Daniele Nicolodi
@ 2010-06-02 13:47             ` Gilles Chanteperdrix
  2010-06-02 15:14             ` Philippe Gerum
  1 sibling, 0 replies; 27+ messages in thread
From: Gilles Chanteperdrix @ 2010-06-02 13:47 UTC (permalink / raw)
  To: Daniele Nicolodi; +Cc: xenomai

Daniele Nicolodi wrote:
> On 02/06/10 11:15, Philippe Gerum wrote:
> 
>> Actually, we could extend this logic and forge a stack frame to return
>> to the preempted application code via some userland trampoline code,
>> doing the switch:
>>
>> [watchdog trigger]
>> 	forge_return_frame(on =regs->sp, to =regs->pc);
>> 	regs->pc = __oops_I_did_it_again;
>>
>> __oops_I_did_it_again:
>> 	__xn_migrate(LINUX_DOMAIN);
>> 	ret (via forged frame)
>>
>> The thing is, that this brings in some arch-dep code to forge a stack
>> frame (like the kernel uses for signals), that should rather live in the
>> pipeline core.
> 
> Am I to naive thinking that this solution would let the user space
> choose what to do when the watchdog interrupts the current thread? In
> your example, it would be enough to assign to __ops_I_did_it_again a
> function pointer to the function that has to be executed.
> 
> Probably there will be hard constraint on what this function can do, but
> it would be a nice feature for debugging and for solving application
> specific issues.

You already have that with SIGDEBUG. You can register whatever signal
handler you want for the SIGDEBUG signal. The same goes for SIGSEGV. The
only issue we are talking about here is that the SIGDEBUG mechanism does
not work when a piece of code is blocked in an infinite loop without
calling any syscall. But that should be a pretty rare case.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xenomai-help] Handling Linux Signals in primary domain context
  2010-06-02 12:02           ` Daniele Nicolodi
  2010-06-02 13:47             ` Gilles Chanteperdrix
@ 2010-06-02 15:14             ` Philippe Gerum
  1 sibling, 0 replies; 27+ messages in thread
From: Philippe Gerum @ 2010-06-02 15:14 UTC (permalink / raw)
  To: Daniele Nicolodi; +Cc: xenomai

On Wed, 2010-06-02 at 14:02 +0200, Daniele Nicolodi wrote:
> On 02/06/10 11:15, Philippe Gerum wrote:
> 
> > Actually, we could extend this logic and forge a stack frame to return
> > to the preempted application code via some userland trampoline code,
> > doing the switch:
> > 
> > [watchdog trigger]
> > 	forge_return_frame(on =regs->sp, to =regs->pc);
> > 	regs->pc = __oops_I_did_it_again;
> > 
> > __oops_I_did_it_again:
> > 	__xn_migrate(LINUX_DOMAIN);
> > 	ret (via forged frame)
> > 
> > The thing is, that this brings in some arch-dep code to forge a stack
> > frame (like the kernel uses for signals), that should rather live in the
> > pipeline core.
> 
> Am I to naive thinking that this solution would let the user space
> choose what to do when the watchdog interrupts the current thread? In
> your example, it would be enough to assign to __ops_I_did_it_again a
> function pointer to the function that has to be executed.
> 
> Probably there will be hard constraint on what this function can do, but
> it would be a nice feature for debugging and for solving application
> specific issues.

If your question is related to handling a watchdog trigger in a
syscall-less runaway loop, that method would likely allow for a user
intercept via some hook, yes. Everything sensible that helps debugging
will do.

> 
> Cheers,


-- 
Philippe.




^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2010-06-02 15:14 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-01 13:50 [Xenomai-help] Handling Linux Signals in primary domain context Tschaeche IT-Services
2010-06-01 13:52 ` Gilles Chanteperdrix
2010-06-01 13:59 ` Gilles Chanteperdrix
2010-06-01 14:32 ` Philippe Gerum
2010-06-01 15:54   ` Tschaeche IT-Services
2010-06-01 16:52     ` Tschaeche IT-Services
2010-06-01 16:58     ` Jan Kiszka
2010-06-02  8:36       ` Gilles Chanteperdrix
2010-06-02  9:14         ` Jan Kiszka
2010-06-02  9:15         ` Philippe Gerum
2010-06-02  9:20           ` Jan Kiszka
2010-06-02  9:28             ` Philippe Gerum
2010-06-02  9:37               ` Gilles Chanteperdrix
2010-06-02 10:06                 ` Philippe Gerum
2010-06-02 10:19                   ` Gilles Chanteperdrix
2010-06-02 10:42                     ` Philippe Gerum
2010-06-02 10:51                       ` Gilles Chanteperdrix
2010-06-02 10:29                   ` Gilles Chanteperdrix
2010-06-02  9:21           ` Gilles Chanteperdrix
2010-06-02  9:23             ` Jan Kiszka
2010-06-02 10:19               ` Tschaeche IT-Services
2010-06-02 10:48                 ` Gilles Chanteperdrix
2010-06-02  9:34             ` Philippe Gerum
2010-06-02  9:43               ` Gilles Chanteperdrix
2010-06-02 12:02           ` Daniele Nicolodi
2010-06-02 13:47             ` Gilles Chanteperdrix
2010-06-02 15:14             ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.