All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] Target frozen with rtcan_virt
       [not found] <505EB99F.7050705@wanadoo.fr>
@ 2012-09-23  7:42 ` Thierry Bultel
  2012-09-23 13:23   ` Gilles Chanteperdrix
  0 siblings, 1 reply; 42+ messages in thread
From: Thierry Bultel @ 2012-09-23  7:42 UTC (permalink / raw)
  To: xenomai

Hi,

My configuration is linux-3.2.21+ipipe-core-3.2.21-arm-1.patch, running in qemu 1.2.0 for versatile express.
I have CONFIG_SMP but qemu is launched with 1 core only (else it does not boot, keeps hung just after "booting the kernel",
but that is not the point here).

-->

I have 2 processes using rtcan0 & rtcan1 respectively, with rtcan_virt

Apparently, as soons 2nd process receives data, the target freezes.

I have attached a gbd to qemu monitor, and it seems that I am stuck in rtcan_raw_recvmsg,

on the

     rtdm_lock_get_irqsave(&rtcan_socket_lock, lock_ctx);

call.

Here is my backtrace:

(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x802e9288 in ?? ()
(gdb) bt
#0  0x802e9288 in ?? ()
Cannot access memory at address 0x2b0
(gdb) symbol-file workspace/Buildroot/output/build/linux-3.2.21/vmlinux
Reading symbols from /home/thierry/workspace/Buildroot/output/build/linux-3.2.21/vmlinux...done.
(gdb) bt
#0  0x802e9288 in arch_spin_lock (lock=<optimized out>)
     at /home/thierry/workspace/Buildroot/output/build/linux-3.2.21/arch/arm/include/asm/spinlock.h:83
#1  rtcan_raw_recvmsg (context=0xb4eec800, user_info=0xbf8b95c0,
     msg=0x8a691f0c, flags=0) at drivers/xenomai/can/rtcan_raw.c:655
#2  0x800fdc64 in __rt_dev_recvmsg (user_info=0xbf8b95c0, fd=<optimized out>,
     msg=0x8a691f0c, flags=0) at kernel/xenomai/skins/rtdm/core.c:575
#3  0x80105528 in sys_rtdm_recvmsg (regs=0x8a691fb0)
     at kernel/xenomai/skins/rtdm/syscall.c:91
#4  0x8009e3d8 in do_hisyscall_event (data=<optimized out>,
     stage=<optimized out>, event=<optimized out>)
     at kernel/xenomai/nucleus/shadow.c:2342
#5  hisyscall_event (event=<optimized out>, ipd=0x8053d0c0,
     data=<optimized out>) at kernel/xenomai/nucleus/shadow.c:2454
#6  0x80072940 in ipipe_syscall_hook (ipd=<optimized out>, regs=0x1)
     at kernel/ipipe/compat.c:237
#7  0x80070788 in __ipipe_notify_syscall (regs=<optimized out>)
     at kernel/ipipe/core.c:972
#8  0x80018314 in __ipipe_syscall_root (scno=983106, regs=0x8a691fb0)
     at arch/arm/kernel/ipipe.c:384
#9  0x8000e818 in vector_swi () at arch/arm/kernel/entry-common.S:446
#10 0x8000e818 in vector_swi () at arch/arm/kernel/entry-common.S:446
#11 0x8000e818 in vector_swi () at arch/arm/kernel/entry-common.S:446

(these goes to infinite)

---Type <return> to continue, or q <return> to quit---

Commenting out the irqsave / restore calls in rtcan_raw_recvmsg  routine, (that is obviuously not right thing to do), makes the freeze not happen
and the communication run fine.

I have honestly serious doubts on my receive process, there is a lot of chance that it has gone in an infinite loop, and moreover the freeze is not
reproducible with the rtcanrecv in place of it.

What I do not understand is that I have CONFIG_XENO_OPT_WATCHDOG and XENO_OPT_WATCHDOG_TIMEOUT = 4, and it is not triggered
(but I have validated it on a separate test case and it works)

How can I debug that issue ?

Thanks
Thierry




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-09-23  7:42 ` [Xenomai] Target frozen with rtcan_virt Thierry Bultel
@ 2012-09-23 13:23   ` Gilles Chanteperdrix
  2012-09-23 14:57     ` Thierry Bultel
  2012-10-09 12:45     ` Thierry Bultel
  0 siblings, 2 replies; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-23 13:23 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: xenomai

On 09/23/2012 09:42 AM, Thierry Bultel wrote:

> Hi,
> 
> My configuration is linux-3.2.21+ipipe-core-3.2.21-arm-1.patch,
> running in qemu 1.2.0 for versatile express. I have CONFIG_SMP but
> qemu is launched with 1 core only (else it does not boot, keeps hung
> just after "booting the kernel", but that is not the point here).


Basically, versatile express is not a board supported by the I-pipe
patch, so, the first thing is to check that the I-pipe port is working
correctly. For instance, I would run tsc -w first to see if the tsc
emulation is working correctly. Then you can use this document:

http://www.xenomai.org/index.php/I-pipe-core:ArmPorting

As a check list.

Also, some modifications would be needed to get it working in SMP mode.
This should be much easier to start with the I-pipe kernel for 3.4,
except for the "global timer" which seems not to be present on versatile
express, so, some code would be needed to skip the initialization in
gt_setup when we detect a processor that does not have a global timer.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-09-23 13:23   ` Gilles Chanteperdrix
@ 2012-09-23 14:57     ` Thierry Bultel
  2012-09-23 14:59       ` Gilles Chanteperdrix
  2012-10-09 12:45     ` Thierry Bultel
  1 sibling, 1 reply; 42+ messages in thread
From: Thierry Bultel @ 2012-09-23 14:57 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Gilles,

thanks for your advice.

I do not mind using the versatile express especially for emulation, I only want a board with a cortex A9
that boots in qemu, with a linux kernel >= 3.2, and an available ipipe patch.

Unfortunately qemu does not support the IMX6 yet, which is the board I aim, so I have to cope with another one.

If you have a board name that is already fully supported and that fits my needs,
please tell me and I will use it with pleasure.

Meanwhile I have set  CONFIG_SMP to "no", and my recv program now just works fine.

I have not been able to identify (in the CONFIG_SMP=y case) the difference between
"rtcanrecv" and my recv program, that may make behave differently (freeze or not) but
only the footprint.

cheers,
Thierry

Le 23/09/2012 15:23, Gilles Chanteperdrix a écrit :

> On 09/23/2012 09:42 AM, Thierry Bultel wrote:
>
>> Hi,
>>
>> My configuration is linux-3.2.21+ipipe-core-3.2.21-arm-1.patch,
>> running in qemu 1.2.0 for versatile express. I have CONFIG_SMP but
>> qemu is launched with 1 core only (else it does not boot, keeps hung
>> just after "booting the kernel", but that is not the point here).
>
> Basically, versatile express is not a board supported by the I-pipe
> patch, so, the first thing is to check that the I-pipe port is working
> correctly. For instance, I would run tsc -w first to see if the tsc
> emulation is working correctly. Then you can use this document:
>
> http://www.xenomai.org/index.php/I-pipe-core:ArmPorting
>
> As a check list.
>
> Also, some modifications would be needed to get it working in SMP mode.
> This should be much easier to start with the I-pipe kernel for 3.4,
> except for the "global timer" which seems not to be present on versatile
> express, so, some code would be needed to skip the initialization in
> gt_setup when we detect a processor that does not have a global timer.
>



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-09-23 14:57     ` Thierry Bultel
@ 2012-09-23 14:59       ` Gilles Chanteperdrix
  0 siblings, 0 replies; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-09-23 14:59 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: xenomai

On 09/23/2012 04:57 PM, Thierry Bultel wrote:

> Gilles,
> 
> thanks for your advice.
> 
> I do not mind using the versatile express especially for emulation, I
> only want a board with a cortex A9 that boots in qemu, with a linux
> kernel >= 3.2, and an available ipipe patch.
> 
> Unfortunately qemu does not support the IMX6 yet, which is the board
> I aim, so I have to cope with another one.
> 
> If you have a board name that is already fully supported and that
> fits my needs, please tell me and I will use it with pleasure.


OMAP4 and IMX6 are the only SMP SOCs I could test. That said, normally
the I-pipe core 3.4 should run on any cortex A9 base processor. For the
I-pipe core 3.2, you need to a few things to get it working, which I
tried and describe in the wiki page.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-09-23 13:23   ` Gilles Chanteperdrix
  2012-09-23 14:57     ` Thierry Bultel
@ 2012-10-09 12:45     ` Thierry Bultel
  2012-10-09 21:34       ` Gilles Chanteperdrix
  1 sibling, 1 reply; 42+ messages in thread
From: Thierry Bultel @ 2012-10-09 12:45 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Hi, I would like to re-open this thread

I have reproduced the same target freeze, on another configuration.
linux-3.2.21+ipipe-core-3.2.21-x86-1.patch

running on kvm (only one core)

rtcan0 and rtcan1 are devices of the unmodifed "rtcan_virt" rtdm driver.

The test sample is the unmodified 
"xenomai-2.6.1/examples/rtdm/profiles/can/rtcan_rtt.c"

[root@buildroot ~]# ./rtcan_rtt rtcan0 rtcan1
1 3
RX rxsock=896, ifr_name=rtcan1
TX txsock=897, ifr_name=rtcan0
Round-Trip-Time test rtcan0 -> rtcan1 with CAN ID 0x1
Cycle time: 10000 us
All RTT timing figures are in us.


---> FROZEN

#


Here's my backtrace, with gdb on qemu's gdbserver:

(gdb) l
58
59        for (;;) {
60            if (inc.head == inc.tail)
61                break;
62            cpu_relax();
63            inc.head = ACCESS_ONCE(lock->tickets.head);
64        }
65        barrier();        /* make sure nothing creeps before the lock 
is taken */
66    }
67
(gdb) bt
#0  __ticket_spin_lock (lock=<optimized out>) at 
/home/thierry/workspace/Buildroot/output/build/linux-3.2.21/arch/x86/include/asm/spinlock.h:63
#1  arch_spin_lock (lock=<optimized out>) at 
/home/thierry/workspace/Buildroot/output/build/linux-3.2.21/arch/x86/include/asm/spinlock.h:129
#2  rtcan_raw_recvmsg (context=0xc7ff0000, user_info=0xd00b2290, 
msg=0xd796df04, flags=<optimized out>) at 
drivers/xenomai/can/rtcan_raw.c:655
#3  0xc111f205 in __rt_dev_recvmsg (user_info=<optimized out>, 
fd=<optimized out>,
     msg=<error reading variable: DWARF-2 expression error: DW_OP_reg 
operations must be used either alone or in conjunction with DW_OP_piece 
or DW_OP_bit_piece.>, flags=0) at kernel/xenomai/skins/rtdm/core.c:575
#4  0xc11241e7 in sys_rtdm_recvmsg (regs=0xd796dfb4) at 
kernel/xenomai/skins/rtdm/syscall.c:91
#5  0xc10d6dfb in do_hisyscall_event (data=<optimized out>, 
stage=<optimized out>, event=<optimized out>) at 
kernel/xenomai/nucleus/shadow.c:2342
#6  hisyscall_event (event=<optimized out>, ipd=<optimized out>, 
data=<optimized out>) at kernel/xenomai/nucleus/shadow.c:2454
#7  0xc109ced6 in ipipe_syscall_hook (ipd=<optimized out>, 
regs=<optimized out>) at kernel/ipipe/compat.c:237
#8  0xc109b51f in __ipipe_notify_syscall (regs=<optimized out>) at 
kernel/ipipe/core.c:972
#9  0xc101afb4 in __ipipe_syscall_root (regs=0xd796dfb4) at 
arch/x86/kernel/ipipe.c:556
#10 0xc1684738 in ?? () at arch/x86/kernel/entry_32.S:479
#11 0xb76ea424 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)


In my previous configuration, disabling CONFIG_SMP was enough to 
workaround it, but in that new case
I have other bad side-effects (console and X server frozen at startup) 
that prevent me to work correctly.

Regards
Thierry

Le 23/09/2012 15:23, Gilles Chanteperdrix a écrit :
> On 09/23/2012 09:42 AM, Thierry Bultel wrote:
>
>> Hi,
>>
>> My configuration is linux-3.2.21+ipipe-core-3.2.21-arm-1.patch,
>> running in qemu 1.2.0 for versatile express. I have CONFIG_SMP but
>> qemu is launched with 1 core only (else it does not boot, keeps hung
>> just after "booting the kernel", but that is not the point here).
>
> Basically, versatile express is not a board supported by the I-pipe
> patch, so, the first thing is to check that the I-pipe port is working
> correctly. For instance, I would run tsc -w first to see if the tsc
> emulation is working correctly. Then you can use this document:
>
> http://www.xenomai.org/index.php/I-pipe-core:ArmPorting
>
> As a check list.
>
> Also, some modifications would be needed to get it working in SMP mode.
> This should be much easier to start with the I-pipe kernel for 3.4,
> except for the "global timer" which seems not to be present on versatile
> express, so, some code would be needed to skip the initialization in
> gt_setup when we detect a processor that does not have a global timer.
>



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-09 12:45     ` Thierry Bultel
@ 2012-10-09 21:34       ` Gilles Chanteperdrix
  2012-10-09 21:41         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-09 21:34 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: xenomai

On 10/09/2012 02:45 PM, Thierry Bultel wrote:
> (snip)

> (gdb) bt
> #0  __ticket_spin_lock (lock=<optimized out>) at 
> /home/thierry/workspace/Buildroot/output/build/linux-3.2.21/arch/x86/include/asm/spinlock.h:63
> #1  arch_spin_lock (lock=<optimized out>) at 
> /home/thierry/workspace/Buildroot/output/build/linux-3.2.21/arch/x86/include/asm/spinlock.h:129
> #2  rtcan_raw_recvmsg (context=0xc7ff0000, user_info=0xd00b2290, 
> msg=0xd796df04, flags=<optimized out>) at 
> drivers/xenomai/can/rtcan_raw.c:655
> (snip)
> 
> 
> In my previous configuration, disabling CONFIG_SMP was enough to 
> workaround it, but in that new case
> I have other bad side-effects (console and X server frozen at startup) 
> that prevent me to work correctly.


That is essentially the same backtrace as in the first case. Could you
try the following patch?

diff --git a/ksrc/drivers/can/rtcan_raw.c b/ksrc/drivers/can/rtcan_raw.c
index ded024e..b07f19c 100644
--- a/ksrc/drivers/can/rtcan_raw.c
+++ b/ksrc/drivers/can/rtcan_raw.c
@@ -695,10 +695,7 @@ ssize_t rtcan_raw_recvmsg(struct rtdm_dev_context *context,
 
     /* Message completely read from the socket's ring buffer. Now check if
      * caller is just peeking. */
-    if (flags & MSG_PEEK)
-	/* Next one, please! */
-	rtdm_sem_up(&sock->recv_sem);
-    else
+    if ((flags & MSG_PEEK) == 0)
 	/* Adjust begin of first message in the ring buffer. */
 	sock->recv_head = recv_buf_index;
 
@@ -707,6 +704,11 @@ ssize_t rtcan_raw_recvmsg(struct rtdm_dev_context *context,
     rtdm_lock_put_irqrestore(&rtcan_socket_lock, lock_ctx);
 
 
+    if (flags & MSG_PEEK)
+	/* Next one, please! */
+	rtdm_sem_up(&sock->recv_sem);
+
+
     /* Create CAN socket address to give back */
     if (msg->msg_namelen) {
 	scan.can_family = AF_CAN;


-- 
                                                                Gilles.


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-09 21:34       ` Gilles Chanteperdrix
@ 2012-10-09 21:41         ` Gilles Chanteperdrix
  2012-10-09 21:51           ` Gilles Chanteperdrix
  0 siblings, 1 reply; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-09 21:41 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: xenomai

On 10/09/2012 11:34 PM, Gilles Chanteperdrix wrote:

> On 10/09/2012 02:45 PM, Thierry Bultel wrote:
>> (snip)
> 
>> (gdb) bt
>> #0  __ticket_spin_lock (lock=<optimized out>) at 
>> /home/thierry/workspace/Buildroot/output/build/linux-3.2.21/arch/x86/include/asm/spinlock.h:63
>> #1  arch_spin_lock (lock=<optimized out>) at 
>> /home/thierry/workspace/Buildroot/output/build/linux-3.2.21/arch/x86/include/asm/spinlock.h:129
>> #2  rtcan_raw_recvmsg (context=0xc7ff0000, user_info=0xd00b2290, 
>> msg=0xd796df04, flags=<optimized out>) at 
>> drivers/xenomai/can/rtcan_raw.c:655
>> (snip)
>>
>>
>> In my previous configuration, disabling CONFIG_SMP was enough to 
>> workaround it, but in that new case
>> I have other bad side-effects (console and X server frozen at startup) 
>> that prevent me to work correctly.
> 
> 
> That is essentially the same backtrace as in the first case. Could you
> try the following patch?
> 
> diff --git a/ksrc/drivers/can/rtcan_raw.c b/ksrc/drivers/can/rtcan_raw.c
> index ded024e..b07f19c 100644
> --- a/ksrc/drivers/can/rtcan_raw.c
> +++ b/ksrc/drivers/can/rtcan_raw.c
> @@ -695,10 +695,7 @@ ssize_t rtcan_raw_recvmsg(struct rtdm_dev_context *context,
>  
>      /* Message completely read from the socket's ring buffer. Now check if
>       * caller is just peeking. */
> -    if (flags & MSG_PEEK)
> -	/* Next one, please! */
> -	rtdm_sem_up(&sock->recv_sem);
> -    else
> +    if ((flags & MSG_PEEK) == 0)
>  	/* Adjust begin of first message in the ring buffer. */
>  	sock->recv_head = recv_buf_index;
>  
> @@ -707,6 +704,11 @@ ssize_t rtcan_raw_recvmsg(struct rtdm_dev_context *context,
>      rtdm_lock_put_irqrestore(&rtcan_socket_lock, lock_ctx);
>  
>  
> +    if (flags & MSG_PEEK)
> +	/* Next one, please! */
> +	rtdm_sem_up(&sock->recv_sem);
> +
> +
>      /* Create CAN socket address to give back */
>      if (msg->msg_namelen) {
>  	scan.can_family = AF_CAN;
> 
> 


Another possible fix would be to lock xenomai scheduler when taking an
rtdm_spin_lock, as it is almost always wrong to call xnpod_schedule in
such a situation. It works for other RT-CAN drivers, because they call
rtdm_sem_up from interrupt context where the scheduler is locked by the
XNINIRQ bit.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-09 21:41         ` Gilles Chanteperdrix
@ 2012-10-09 21:51           ` Gilles Chanteperdrix
  2012-10-10  7:38             ` Thierry Bultel
  0 siblings, 1 reply; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-09 21:51 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: xenomai

On 10/09/2012 11:41 PM, Gilles Chanteperdrix wrote:

> Another possible fix would be to lock xenomai scheduler when taking an
> rtdm_spin_lock, as it is almost always wrong to call xnpod_schedule in
> such a situation. It works for other RT-CAN drivers, because they call
> rtdm_sem_up from interrupt context where the scheduler is locked by the
> XNINIRQ bit.
> 


IOW, something like:

diff --git a/include/rtdm/rtdm_driver.h b/include/rtdm/rtdm_driver.h
index 68135e4..7b18c34 100644
--- a/include/rtdm/rtdm_driver.h
+++ b/include/rtdm/rtdm_driver.h
@@ -730,6 +730,7 @@ typedef unsigned long rtdm_lockctx_t;
 	do {							\
 		XENO_BUGON(RTDM, !rthal_local_irq_disabled());	\
 		rthal_spin_lock(lock);				\
+		xnpod_lock_sched();				\
 	} while (0)
 #endif
 
@@ -749,7 +750,11 @@ typedef unsigned long rtdm_lockctx_t;
  *
  * Rescheduling: never.
  */
-#define rtdm_lock_put(lock)	rthal_spin_unlock(lock)
+#define rtdm_lock_put(lock)					\
+	do {							\
+		rthal_spin_unlock(lock);			\
+		xnpod_unlock_sched();				\
+	} while (0)
 
 /**
  * Acquire lock and disable preemption
@@ -768,8 +773,11 @@ typedef unsigned long rtdm_lockctx_t;
  *
  * Rescheduling: never.
  */
-#define rtdm_lock_get_irqsave(lock, context)	\
-	rthal_spin_lock_irqsave(lock, context)
+#define rtdm_lock_get_irqsave(lock, context)				\
+	do {								\
+		rthal_spin_lock_irqsave(lock, context);			\
+		xnpod_lock_sched();					\
+	} while (0)
 
 /**
  * Release lock and restore preemption state
@@ -788,8 +796,12 @@ typedef unsigned long rtdm_lockctx_t;
  *
  * Rescheduling: possible.
  */
-#define rtdm_lock_put_irqrestore(lock, context)	\
-	rthal_spin_unlock_irqrestore(lock, context)
+#define rtdm_lock_put_irqrestore(lock, context)			\
+	do {							\
+		rthal_spin_unlock(lock);			\
+		xnpod_unlock_sched();				\
+		rthal_local_irq_restore(context);		\
+	} while (0)
 
 /**
  * Disable preemption locally



-- 
                                                                Gilles.


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-09 21:51           ` Gilles Chanteperdrix
@ 2012-10-10  7:38             ` Thierry Bultel
  2012-10-10  7:51               ` Gilles Chanteperdrix
  0 siblings, 1 reply; 42+ messages in thread
From: Thierry Bultel @ 2012-10-10  7:38 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Hi Gilles,

Many thanks,
The first patch does not work, the second does.
I think the reason for 1st patch why is that in rtcan_virt, we have

rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
rtdm_lock_get(&rtcan_socket_lock);

...
--->        rtcan_rcv(rx_dev, &skb);
....

rtdm_lock_put(&rtcan_socket_lock);
rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);

and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);

thus the same re-scheduling stuff with interrupts locked.

Are you not not afraid of side effects with the second patch, 
since you change the overall behaviour ?
Won't you prefer a only locally modified rtcan_virt ?

eg something (not tested) like:

    xnpod_lock_sched()
    rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
    rtdm_lock_get(&rtcan_socket_lock);

...

                rtcan_rcv(rx_dev, &skb);

....

    rtdm_lock_put(&rtcan_socket_lock);
    rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
    xnpod_unlock_sched()

Cheers
Thierry


Le 09/10/2012 23:51, Gilles Chanteperdrix a écrit :
> On 10/09/2012 11:41 PM, Gilles Chanteperdrix wrote:
>
>> Another possible fix would be to lock xenomai scheduler when taking an
>> rtdm_spin_lock, as it is almost always wrong to call xnpod_schedule in
>> such a situation. It works for other RT-CAN drivers, because they call
>> rtdm_sem_up from interrupt context where the scheduler is locked by the
>> XNINIRQ bit.
>>
>
> IOW, something like:
>
> diff --git a/include/rtdm/rtdm_driver.h b/include/rtdm/rtdm_driver.h
> index 68135e4..7b18c34 100644
> --- a/include/rtdm/rtdm_driver.h
> +++ b/include/rtdm/rtdm_driver.h
> @@ -730,6 +730,7 @@ typedef unsigned long rtdm_lockctx_t;
>  	do {							\
>  		XENO_BUGON(RTDM, !rthal_local_irq_disabled());	\
>  		rthal_spin_lock(lock);				\
> +		xnpod_lock_sched();				\
>  	} while (0)
>  #endif
>  
> @@ -749,7 +750,11 @@ typedef unsigned long rtdm_lockctx_t;
>   *
>   * Rescheduling: never.
>   */
> -#define rtdm_lock_put(lock)	rthal_spin_unlock(lock)
> +#define rtdm_lock_put(lock)					\
> +	do {							\
> +		rthal_spin_unlock(lock);			\
> +		xnpod_unlock_sched();				\
> +	} while (0)
>  
>  /**
>   * Acquire lock and disable preemption
> @@ -768,8 +773,11 @@ typedef unsigned long rtdm_lockctx_t;
>   *
>   * Rescheduling: never.
>   */
> -#define rtdm_lock_get_irqsave(lock, context)	\
> -	rthal_spin_lock_irqsave(lock, context)
> +#define rtdm_lock_get_irqsave(lock, context)				\
> +	do {								\
> +		rthal_spin_lock_irqsave(lock, context);			\
> +		xnpod_lock_sched();					\
> +	} while (0)
>  
>  /**
>   * Release lock and restore preemption state
> @@ -788,8 +796,12 @@ typedef unsigned long rtdm_lockctx_t;
>   *
>   * Rescheduling: possible.
>   */
> -#define rtdm_lock_put_irqrestore(lock, context)	\
> -	rthal_spin_unlock_irqrestore(lock, context)
> +#define rtdm_lock_put_irqrestore(lock, context)			\
> +	do {							\
> +		rthal_spin_unlock(lock);			\
> +		xnpod_unlock_sched();				\
> +		rthal_local_irq_restore(context);		\
> +	} while (0)
>  
>  /**
>   * Disable preemption locally
>
>
>



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10  7:38             ` Thierry Bultel
@ 2012-10-10  7:51               ` Gilles Chanteperdrix
  2012-10-10  7:56                 ` Jan Kiszka
  0 siblings, 1 reply; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-10  7:51 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: Jan Kiszka, xenomai

On 10/10/2012 09:38 AM, Thierry Bultel wrote:
> Hi Gilles,
> 
> Many thanks,
> The first patch does not work, the second does.
> I think the reason for 1st patch why is that in rtcan_virt, we have
> 
> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
> rtdm_lock_get(&rtcan_socket_lock);
> 
> ...
> --->        rtcan_rcv(rx_dev, &skb);
> ....
> 
> rtdm_lock_put(&rtcan_socket_lock);
> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
> 
> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
> 
> thus the same re-scheduling stuff with interrupts locked.
> 
> Are you not not afraid of side effects with the second patch, 
> since you change the overall behaviour ?
> Won't you prefer a only locally modified rtcan_virt ?

We should ask Jan's opinion. In any case, if we adopt the second patch,
we might want to try and reduce the overhead of xnpod_unlock_sched.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10  7:51               ` Gilles Chanteperdrix
@ 2012-10-10  7:56                 ` Jan Kiszka
  2012-10-10  8:04                   ` Gilles Chanteperdrix
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Kiszka @ 2012-10-10  7:56 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Thierry Bultel, xenomai

On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>> Hi Gilles,
>>
>> Many thanks,
>> The first patch does not work, the second does.
>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>
>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>> rtdm_lock_get(&rtcan_socket_lock);
>>
>> ...
>> --->        rtcan_rcv(rx_dev, &skb);
>> ....
>>
>> rtdm_lock_put(&rtcan_socket_lock);
>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>
>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>
>> thus the same re-scheduling stuff with interrupts locked.
>>
>> Are you not not afraid of side effects with the second patch, 
>> since you change the overall behaviour ?
>> Won't you prefer a only locally modified rtcan_virt ?
> 
> We should ask Jan's opinion. In any case, if we adopt the second patch,
> we might want to try and reduce the overhead of xnpod_unlock_sched.
> 

We were signaling the semaphore while holding a spin lock? That's a
clear bug. Your patch is aligning rtcan to the pattern we are also using
in RTnet. We just need to make sure (haven't looked at the full context
yet) that sock remains valid even after dropping the lock(s).

Jan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 259 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20121010/72167921/attachment.pgp>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10  7:56                 ` Jan Kiszka
@ 2012-10-10  8:04                   ` Gilles Chanteperdrix
  2012-10-10  8:10                     ` Jan Kiszka
  0 siblings, 1 reply; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-10  8:04 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Thierry Bultel, xenomai

On 10/10/2012 09:56 AM, Jan Kiszka wrote:
> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>> Hi Gilles,
>>>
>>> Many thanks,
>>> The first patch does not work, the second does.
>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>
>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>> rtdm_lock_get(&rtcan_socket_lock);
>>>
>>> ...
>>> --->        rtcan_rcv(rx_dev, &skb);
>>> ....
>>>
>>> rtdm_lock_put(&rtcan_socket_lock);
>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>
>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>
>>> thus the same re-scheduling stuff with interrupts locked.
>>>
>>> Are you not not afraid of side effects with the second patch, 
>>> since you change the overall behaviour ?
>>> Won't you prefer a only locally modified rtcan_virt ?
>>
>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>
> 
> We were signaling the semaphore while holding a spin lock? That's a
> clear bug. Your patch is aligning rtcan to the pattern we are also using
> in RTnet. We just need to make sure (haven't looked at the full context
> yet) that sock remains valid even after dropping the lock(s).

The second patch idea was to lock the scheduler while spinlocks are
held, so that posting a semaphore while holding a spin lock is no longer
a bug.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10  8:04                   ` Gilles Chanteperdrix
@ 2012-10-10  8:10                     ` Jan Kiszka
  2012-10-10  8:58                       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Kiszka @ 2012-10-10  8:10 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Thierry Bultel, Wolfgang Grandegger, xenomai

On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>> Hi Gilles,
>>>>
>>>> Many thanks,
>>>> The first patch does not work, the second does.
>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>
>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>
>>>> ...
>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>> ....
>>>>
>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>
>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>
>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>
>>>> Are you not not afraid of side effects with the second patch, 
>>>> since you change the overall behaviour ?
>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>
>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>
>>
>> We were signaling the semaphore while holding a spin lock? That's a
>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>> in RTnet. We just need to make sure (haven't looked at the full context
>> yet) that sock remains valid even after dropping the lock(s).
> 
> The second patch idea was to lock the scheduler while spinlocks are
> held, so that posting a semaphore while holding a spin lock is no longer
> a bug.

Sounds a bit hacky, but I think we have this pattern
(RTDM_EXECUTE_ATOMICALLY) in rtcan somewhere already. IIRC, the
difference to RTnet is that we cannot easily lock down the affected
object (the socket) here.

Jan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 259 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20121010/9a82cd99/attachment.pgp>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10  8:10                     ` Jan Kiszka
@ 2012-10-10  8:58                       ` Gilles Chanteperdrix
  2012-10-10  9:01                         ` Jan Kiszka
  0 siblings, 1 reply; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-10  8:58 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Thierry Bultel, Wolfgang Grandegger, xenomai

On 10/10/2012 10:10 AM, Jan Kiszka wrote:
> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>> Hi Gilles,
>>>>>
>>>>> Many thanks,
>>>>> The first patch does not work, the second does.
>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>
>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>
>>>>> ...
>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>> ....
>>>>>
>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>
>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>
>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>
>>>>> Are you not not afraid of side effects with the second patch, 
>>>>> since you change the overall behaviour ?
>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>
>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>
>>>
>>> We were signaling the semaphore while holding a spin lock? That's a
>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>> in RTnet. We just need to make sure (haven't looked at the full context
>>> yet) that sock remains valid even after dropping the lock(s).
>>
>> The second patch idea was to lock the scheduler while spinlocks are
>> held, so that posting a semaphore while holding a spin lock is no longer
>> a bug.
> 
> Sounds a bit hacky,

Well, that is what the linux kernel does.

 but I think we have this pattern
> (RTDM_EXECUTE_ATOMICALLY)

RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
foo() and bar() are not executed atomically if sem_up wakes up another
thread.

So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
talking about.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10  8:58                       ` Gilles Chanteperdrix
@ 2012-10-10  9:01                         ` Jan Kiszka
  2012-10-10  9:23                           ` Gilles Chanteperdrix
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Kiszka @ 2012-10-10  9:01 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Thierry Bultel, Wolfgang Grandegger, xenomai

On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>> Hi Gilles,
>>>>>>
>>>>>> Many thanks,
>>>>>> The first patch does not work, the second does.
>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>
>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>
>>>>>> ...
>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>> ....
>>>>>>
>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>
>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>
>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>
>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>> since you change the overall behaviour ?
>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>
>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>
>>>>
>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>> yet) that sock remains valid even after dropping the lock(s).
>>>
>>> The second patch idea was to lock the scheduler while spinlocks are
>>> held, so that posting a semaphore while holding a spin lock is no longer
>>> a bug.
>>
>> Sounds a bit hacky,
> 
> Well, that is what the linux kernel does.
> 
>  but I think we have this pattern
>> (RTDM_EXECUTE_ATOMICALLY)
> 
> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
> foo() and bar() are not executed atomically if sem_up wakes up another
> thread.
> 
> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
> talking about.

RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
code, executing it atomically as rescheduling is postponed until the end
of the block.

Jan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 259 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20121010/091a2ed9/attachment.pgp>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10  9:01                         ` Jan Kiszka
@ 2012-10-10  9:23                           ` Gilles Chanteperdrix
  2012-10-10 10:04                             ` Jan Kiszka
  0 siblings, 1 reply; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-10  9:23 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Thierry Bultel, Wolfgang Grandegger, xenomai

On 10/10/2012 11:01 AM, Jan Kiszka wrote:
> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>> Hi Gilles,
>>>>>>>
>>>>>>> Many thanks,
>>>>>>> The first patch does not work, the second does.
>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>
>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>
>>>>>>> ...
>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>> ....
>>>>>>>
>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>
>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>
>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>
>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>> since you change the overall behaviour ?
>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>
>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>
>>>>>
>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>
>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>> a bug.
>>>
>>> Sounds a bit hacky,
>>
>> Well, that is what the linux kernel does.
>>
>>  but I think we have this pattern
>>> (RTDM_EXECUTE_ATOMICALLY)
>>
>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>> foo() and bar() are not executed atomically if sem_up wakes up another
>> thread.
>>
>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>> talking about.
> 
> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
> code, executing it atomically as rescheduling is postponed until the end
> of the block.

Err... no. Absolutely not. Calling xnpod_schedule while holding the
nucleus lock does not change anything, the rescheduling takes place at
that point. What you say only works if RTDM_EXECUTE_ATOMICALLY is called
from a secondary domain context, where xnpod_schedule needs the
escalation virq, and indeed the escalation virq is only handled when the
xenomai domain is unstalled.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10  9:23                           ` Gilles Chanteperdrix
@ 2012-10-10 10:04                             ` Jan Kiszka
  2012-10-10 10:07                               ` Gilles Chanteperdrix
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Kiszka @ 2012-10-10 10:04 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Thierry Bultel, Wolfgang Grandegger, xenomai

On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>> Hi Gilles,
>>>>>>>>
>>>>>>>> Many thanks,
>>>>>>>> The first patch does not work, the second does.
>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>
>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>
>>>>>>>> ...
>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>> ....
>>>>>>>>
>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>
>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>
>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>
>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>> since you change the overall behaviour ?
>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>
>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>
>>>>>>
>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>
>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>> a bug.
>>>>
>>>> Sounds a bit hacky,
>>>
>>> Well, that is what the linux kernel does.
>>>
>>>  but I think we have this pattern
>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>
>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>> thread.
>>>
>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>> talking about.
>>
>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>> code, executing it atomically as rescheduling is postponed until the end
>> of the block.
> 
> Err... no. Absolutely not.

Err... absolutely right.

The good news is: we don't need to worry about such kind of locking. In
rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
in a handler. So it won't disappear when we drop the lock, and your
first patch is fine.

Jan


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 259 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20121010/baf3ed37/attachment.pgp>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 10:04                             ` Jan Kiszka
@ 2012-10-10 10:07                               ` Gilles Chanteperdrix
  2012-10-10 10:25                                 ` Jan Kiszka
  0 siblings, 1 reply; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-10 10:07 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Thierry Bultel, Wolfgang Grandegger, xenomai

On 10/10/2012 12:04 PM, Jan Kiszka wrote:
> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>> Hi Gilles,
>>>>>>>>>
>>>>>>>>> Many thanks,
>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>
>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>> ....
>>>>>>>>>
>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>
>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>
>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>
>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>
>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>
>>>>>>>
>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>
>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>> a bug.
>>>>>
>>>>> Sounds a bit hacky,
>>>>
>>>> Well, that is what the linux kernel does.
>>>>
>>>>  but I think we have this pattern
>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>
>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>> thread.
>>>>
>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>> talking about.
>>>
>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>> code, executing it atomically as rescheduling is postponed until the end
>>> of the block.
>>
>> Err... no. Absolutely not.
> 
> Err... absolutely right.
> 
> The good news is: we don't need to worry about such kind of locking. In
> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
> in a handler. So it won't disappear when we drop the lock, and your
> first patch is fine.

Which one? The first one does not seem to work because the rtdm locks
seem to be nested. The second one would probably need to find a way to
reduce the overhead of xnpod_unlock_sched(). What can be done, however,
is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
RTDM_EXECUTE_ATOMICALLY.


-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 10:07                               ` Gilles Chanteperdrix
@ 2012-10-10 10:25                                 ` Jan Kiszka
  2012-10-10 10:54                                   ` Philippe Gerum
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Kiszka @ 2012-10-10 10:25 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Thierry Bultel, Wolfgang Grandegger, xenomai

On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>> Hi Gilles,
>>>>>>>>>>
>>>>>>>>>> Many thanks,
>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>
>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>
>>>>>>>>>> ...
>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>> ....
>>>>>>>>>>
>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>
>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>
>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>
>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>
>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>
>>>>>>>>
>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>
>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>> a bug.
>>>>>>
>>>>>> Sounds a bit hacky,
>>>>>
>>>>> Well, that is what the linux kernel does.
>>>>>
>>>>>  but I think we have this pattern
>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>
>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>> thread.
>>>>>
>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>> talking about.
>>>>
>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>> code, executing it atomically as rescheduling is postponed until the end
>>>> of the block.
>>>
>>> Err... no. Absolutely not.
>>
>> Err... absolutely right.
>>
>> The good news is: we don't need to worry about such kind of locking. In
>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>> in a handler. So it won't disappear when we drop the lock, and your
>> first patch is fine.
> 
> Which one? The first one does not seem to work because the rtdm locks
> seem to be nested. The second one would probably need to find a way to
> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
> RTDM_EXECUTE_ATOMICALLY.

Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
used much on SMP so far. That looks indeed unresolvable without a
semantical change to rtdm_lock/unlock.

But then we really need something as light-weight as preempt_enable/disable.

Jan


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 259 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20121010/4aac3da0/attachment.pgp>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 10:25                                 ` Jan Kiszka
@ 2012-10-10 10:54                                   ` Philippe Gerum
  2012-10-10 11:30                                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 42+ messages in thread
From: Philippe Gerum @ 2012-10-10 10:54 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Thierry Bultel, Wolfgang Grandegger, xenomai

On 10/10/2012 12:25 PM, Jan Kiszka wrote:
> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>
>>>>>>>>>>> Many thanks,
>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>
>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>> ....
>>>>>>>>>>>
>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>
>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>
>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>
>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>
>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>
>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>> a bug.
>>>>>>>
>>>>>>> Sounds a bit hacky,
>>>>>>
>>>>>> Well, that is what the linux kernel does.
>>>>>>
>>>>>>  but I think we have this pattern
>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>
>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>> thread.
>>>>>>
>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>> talking about.
>>>>>
>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>> of the block.
>>>>
>>>> Err... no. Absolutely not.
>>>
>>> Err... absolutely right.
>>>
>>> The good news is: we don't need to worry about such kind of locking. In
>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>> in a handler. So it won't disappear when we drop the lock, and your
>>> first patch is fine.
>>
>> Which one? The first one does not seem to work because the rtdm locks
>> seem to be nested. The second one would probably need to find a way to
>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>> RTDM_EXECUTE_ATOMICALLY.
> 
> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
> used much on SMP so far. That looks indeed unresolvable without a
> semantical change to rtdm_lock/unlock.
> 
> But then we really need something as light-weight as preempt_enable/disable.
> 

This is not as lightweight as it might be given that we pair a flag and
a counter to achieve this (which saves one data reference in
xnpod_schedule() though), but this is a start:

http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497

So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
in the nucleus API to manipulate the sched locking counter from a
context where the nucleus lock is already held, so that RTDM can rely on
this for RTDM_EXECUTE_ATOMICALLY().

-- 
Philippe.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 10:54                                   ` Philippe Gerum
@ 2012-10-10 11:30                                     ` Gilles Chanteperdrix
  2012-10-10 12:09                                       ` Philippe Gerum
  0 siblings, 1 reply; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-10 11:30 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 12:54 PM, Philippe Gerum wrote:

> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>
>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>
>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>
>>>>>>>>>>>> ...
>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>> ....
>>>>>>>>>>>>
>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>
>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>
>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>
>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>
>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>
>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>> a bug.
>>>>>>>>
>>>>>>>> Sounds a bit hacky,
>>>>>>>
>>>>>>> Well, that is what the linux kernel does.
>>>>>>>
>>>>>>>  but I think we have this pattern
>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>
>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>> thread.
>>>>>>>
>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>> talking about.
>>>>>>
>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>> of the block.
>>>>>
>>>>> Err... no. Absolutely not.
>>>>
>>>> Err... absolutely right.
>>>>
>>>> The good news is: we don't need to worry about such kind of locking. In
>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>> first patch is fine.
>>>
>>> Which one? The first one does not seem to work because the rtdm locks
>>> seem to be nested. The second one would probably need to find a way to
>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>> RTDM_EXECUTE_ATOMICALLY.
>>
>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>> used much on SMP so far. That looks indeed unresolvable without a
>> semantical change to rtdm_lock/unlock.
>>
>> But then we really need something as light-weight as preempt_enable/disable.
>>
> This is not as lightweight as it might be given that we pair a flag and
> a counter to achieve this (which saves one data reference in
> xnpod_schedule() though), but this is a start:
> 
> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
> 
> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
> in the nucleus API to manipulate the sched locking counter from a
> context where the nucleus lock is already held, so that RTDM can rely on
> this for RTDM_EXECUTE_ATOMICALLY().


The problem of xnpod_unlock_sched from my point of view is this section
of code:

 		xnsched_set_self_resched(curr->sched);
 		xnpod_schedule();

It means that we will go to the full blown __xnpod_schedule when
unlocking the top-most spinlock.

I guess what I meant is that we should have a scheduler bit that is
simply tested in xnpod_schedule, but then we loose the ability for the
threads with the scheduler locked to suspend themselves. So, maybe we
should define another xnpod_lock/unlock pair, maybe something like
xnpod_preempt_disablle()/xnpod_preempt_enabled().


-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 11:30                                     ` Gilles Chanteperdrix
@ 2012-10-10 12:09                                       ` Philippe Gerum
  2012-10-10 12:33                                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 42+ messages in thread
From: Philippe Gerum @ 2012-10-10 12:09 UTC (permalink / raw)
  To: Gilles Chanteperdrix
  Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
> 
>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>>
>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>> ....
>>>>>>>>>>>>>
>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>
>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>
>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>
>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>
>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>>> a bug.
>>>>>>>>>
>>>>>>>>> Sounds a bit hacky,
>>>>>>>>
>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>
>>>>>>>>  but I think we have this pattern
>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>
>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>>> thread.
>>>>>>>>
>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>> talking about.
>>>>>>>
>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>>> of the block.
>>>>>>
>>>>>> Err... no. Absolutely not.
>>>>>
>>>>> Err... absolutely right.
>>>>>
>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>> first patch is fine.
>>>>
>>>> Which one? The first one does not seem to work because the rtdm locks
>>>> seem to be nested. The second one would probably need to find a way to
>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>> RTDM_EXECUTE_ATOMICALLY.
>>>
>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>> used much on SMP so far. That looks indeed unresolvable without a
>>> semantical change to rtdm_lock/unlock.
>>>
>>> But then we really need something as light-weight as preempt_enable/disable.
>>>
>> This is not as lightweight as it might be given that we pair a flag and
>> a counter to achieve this (which saves one data reference in
>> xnpod_schedule() though), but this is a start:
>>
>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>
>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>> in the nucleus API to manipulate the sched locking counter from a
>> context where the nucleus lock is already held, so that RTDM can rely on
>> this for RTDM_EXECUTE_ATOMICALLY().
> 
> 
> The problem of xnpod_unlock_sched from my point of view is this section
> of code:
> 
>  		xnsched_set_self_resched(curr->sched);
>  		xnpod_schedule();
> 
> It means that we will go to the full blown __xnpod_schedule when
> unlocking the top-most spinlock.
> 
> I guess what I meant is that we should have a scheduler bit that is
> simply tested in xnpod_schedule, but then we loose the ability for the
> threads with the scheduler locked to suspend themselves. So, maybe we
> should define another xnpod_lock/unlock pair, maybe something like
> xnpod_preempt_disablle()/xnpod_preempt_enabled().
> 
> 

xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.

The main issue we have for really optimizing the implementation, is
hidden deep into the way we refer to the current running context: we do
have to go through the current scheduler structure to get the current
thread, which in turn requires to disable preemption...which is overkill
by design.

core pipelines used in non-legacy mode allow the client code to define a
stack-based thread-info block, but this won't work for 2.6.x which
follows the legacy API. And this will work in 3.x when we will have
gotten rid of these crappy Xenomai-specific stacks for kernel rt threads.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 12:09                                       ` Philippe Gerum
@ 2012-10-10 12:33                                         ` Gilles Chanteperdrix
  2012-10-10 12:41                                           ` Philippe Gerum
  0 siblings, 1 reply; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-10 12:33 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 02:09 PM, Philippe Gerum wrote:
> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>
>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>
>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>>>> a bug.
>>>>>>>>>>
>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>
>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>
>>>>>>>>>  but I think we have this pattern
>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>
>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>>>> thread.
>>>>>>>>>
>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>> talking about.
>>>>>>>>
>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>>>> of the block.
>>>>>>>
>>>>>>> Err... no. Absolutely not.
>>>>>>
>>>>>> Err... absolutely right.
>>>>>>
>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>> first patch is fine.
>>>>>
>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>> seem to be nested. The second one would probably need to find a way to
>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>
>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>> semantical change to rtdm_lock/unlock.
>>>>
>>>> But then we really need something as light-weight as preempt_enable/disable.
>>>>
>>> This is not as lightweight as it might be given that we pair a flag and
>>> a counter to achieve this (which saves one data reference in
>>> xnpod_schedule() though), but this is a start:
>>>
>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>
>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>> in the nucleus API to manipulate the sched locking counter from a
>>> context where the nucleus lock is already held, so that RTDM can rely on
>>> this for RTDM_EXECUTE_ATOMICALLY().
>>
>>
>> The problem of xnpod_unlock_sched from my point of view is this section
>> of code:
>>
>>  		xnsched_set_self_resched(curr->sched);
>>  		xnpod_schedule();
>>
>> It means that we will go to the full blown __xnpod_schedule when
>> unlocking the top-most spinlock.
>>
>> I guess what I meant is that we should have a scheduler bit that is
>> simply tested in xnpod_schedule, but then we loose the ability for the
>> threads with the scheduler locked to suspend themselves. So, maybe we
>> should define another xnpod_lock/unlock pair, maybe something like
>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>
>>
> 
> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.

Yes, but the reason why we have to go to __xnpod_schedule in
xnpod_unlock_sched() is to allow thread to be suspended with the
scheduler locked (you added that at some point for the vxworks emulator,
if I remember correctly). But without that need, we can put the XNLOCK
bit in the sched->status structure, and simply test that bit with all
the others in xnpod_schedule(), and forget about the need to call
xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.

> 
> The main issue we have for really optimizing the implementation, is
> hidden deep into the way we refer to the current running context: we do
> have to go through the current scheduler structure to get the current
> thread, which in turn requires to disable preemption...which is overkill
> by design.

And the fact that we do it over and over again for even only one skin
service implementation.

> 
> core pipelines used in non-legacy mode allow the client code to define a
> stack-based thread-info block, but this won't work for 2.6.x which
> follows the legacy API. And this will work in 3.x when we will have
> gotten rid of these crappy Xenomai-specific stacks for kernel rt threads.
> 

I agree.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 12:33                                         ` Gilles Chanteperdrix
@ 2012-10-10 12:41                                           ` Philippe Gerum
  2012-10-10 12:43                                             ` Philippe Gerum
  2012-10-10 12:55                                             ` Gilles Chanteperdrix
  0 siblings, 2 replies; 42+ messages in thread
From: Philippe Gerum @ 2012-10-10 12:41 UTC (permalink / raw)
  To: Gilles Chanteperdrix
  Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>
>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>
>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>>>>> a bug.
>>>>>>>>>>>
>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>
>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>
>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>
>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>>>>> thread.
>>>>>>>>>>
>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>>> talking about.
>>>>>>>>>
>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>>>>> of the block.
>>>>>>>>
>>>>>>>> Err... no. Absolutely not.
>>>>>>>
>>>>>>> Err... absolutely right.
>>>>>>>
>>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>> first patch is fine.
>>>>>>
>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>
>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>> semantical change to rtdm_lock/unlock.
>>>>>
>>>>> But then we really need something as light-weight as preempt_enable/disable.
>>>>>
>>>> This is not as lightweight as it might be given that we pair a flag and
>>>> a counter to achieve this (which saves one data reference in
>>>> xnpod_schedule() though), but this is a start:
>>>>
>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>
>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>> in the nucleus API to manipulate the sched locking counter from a
>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>
>>>
>>> The problem of xnpod_unlock_sched from my point of view is this section
>>> of code:
>>>
>>>  		xnsched_set_self_resched(curr->sched);
>>>  		xnpod_schedule();
>>>
>>> It means that we will go to the full blown __xnpod_schedule when
>>> unlocking the top-most spinlock.
>>>
>>> I guess what I meant is that we should have a scheduler bit that is
>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>> should define another xnpod_lock/unlock pair, maybe something like
>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>
>>>
>>
>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
> 
> Yes, but the reason why we have to go to __xnpod_schedule in
> xnpod_unlock_sched() is to allow thread to be suspended with the
> scheduler locked (you added that at some point for the vxworks emulator,
> if I remember correctly). But without that need, we can put the XNLOCK
> bit in the sched->status structure, and simply test that bit with all
> the others in xnpod_schedule(), and forget about the need to call
> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
> 

You may have to reschedule after an IRQ has interrupted the lock owner
inside the locked section, in which case you have to make sure to
reschedule when dropping the sched lock.

This change was not directly related to holding the scheduler lock
across rescheduling points. The motivation was to plug a caveat
preventing fully non-preemptible sections to be built with a
rescheduling point in the middle: sometimes the app logic really wants
all operations to run un-preempted before the caller switches out.

>>
>> The main issue we have for really optimizing the implementation, is
>> hidden deep into the way we refer to the current running context: we do
>> have to go through the current scheduler structure to get the current
>> thread, which in turn requires to disable preemption...which is overkill
>> by design.
> 
> And the fact that we do it over and over again for even only one skin
> service implementation.
> 
>>
>> core pipelines used in non-legacy mode allow the client code to define a
>> stack-based thread-info block, but this won't work for 2.6.x which
>> follows the legacy API. And this will work in 3.x when we will have
>> gotten rid of these crappy Xenomai-specific stacks for kernel rt threads.
>>
> 
> I agree.
> 


-- 
Philippe.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 12:41                                           ` Philippe Gerum
@ 2012-10-10 12:43                                             ` Philippe Gerum
  2012-10-10 12:45                                               ` Philippe Gerum
  2012-10-10 12:55                                             ` Gilles Chanteperdrix
  1 sibling, 1 reply; 42+ messages in thread
From: Philippe Gerum @ 2012-10-10 12:43 UTC (permalink / raw)
  To: Gilles Chanteperdrix
  Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 02:41 PM, Philippe Gerum wrote:
> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>
>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>>
>>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>
>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>
>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>
>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>
>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>>>>>> thread.
>>>>>>>>>>>
>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>>>> talking about.
>>>>>>>>>>
>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>>>>>> of the block.
>>>>>>>>>
>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>
>>>>>>>> Err... absolutely right.
>>>>>>>>
>>>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>>> first patch is fine.
>>>>>>>
>>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>
>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>
>>>>>> But then we really need something as light-weight as preempt_enable/disable.
>>>>>>
>>>>> This is not as lightweight as it might be given that we pair a flag and
>>>>> a counter to achieve this (which saves one data reference in
>>>>> xnpod_schedule() though), but this is a start:
>>>>>
>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>
>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>
>>>>
>>>> The problem of xnpod_unlock_sched from my point of view is this section
>>>> of code:
>>>>
>>>>  		xnsched_set_self_resched(curr->sched);
>>>>  		xnpod_schedule();
>>>>
>>>> It means that we will go to the full blown __xnpod_schedule when
>>>> unlocking the top-most spinlock.
>>>>
>>>> I guess what I meant is that we should have a scheduler bit that is
>>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>
>>>>
>>>
>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
>>
>> Yes, but the reason why we have to go to __xnpod_schedule in
>> xnpod_unlock_sched() is to allow thread to be suspended with the
>> scheduler locked (you added that at some point for the vxworks emulator,
>> if I remember correctly). But without that need, we can put the XNLOCK
>> bit in the sched->status structure, and simply test that bit with all
>> the others in xnpod_schedule(), and forget about the need to call
>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>>
> 
> You may have to reschedule after an IRQ has interrupted the lock owner
> inside the locked section, in which case you have to make sure to
> reschedule when dropping the sched lock.
> 
> This change was not

s/was not/was/

 directly related to holding the scheduler lock
> across rescheduling points. The motivation was to plug a caveat
> preventing fully non-preemptible sections to be built with a
> rescheduling point in the middle: sometimes the app logic really wants
> all operations to run un-preempted before the caller switches out.
> 
>>>
>>> The main issue we have for really optimizing the implementation, is
>>> hidden deep into the way we refer to the current running context: we do
>>> have to go through the current scheduler structure to get the current
>>> thread, which in turn requires to disable preemption...which is overkill
>>> by design.
>>
>> And the fact that we do it over and over again for even only one skin
>> service implementation.
>>
>>>
>>> core pipelines used in non-legacy mode allow the client code to define a
>>> stack-based thread-info block, but this won't work for 2.6.x which
>>> follows the legacy API. And this will work in 3.x when we will have
>>> gotten rid of these crappy Xenomai-specific stacks for kernel rt threads.
>>>
>>
>> I agree.
>>
> 
> 


-- 
Philippe.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 12:43                                             ` Philippe Gerum
@ 2012-10-10 12:45                                               ` Philippe Gerum
  0 siblings, 0 replies; 42+ messages in thread
From: Philippe Gerum @ 2012-10-10 12:45 UTC (permalink / raw)
  To: Gilles Chanteperdrix
  Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 02:43 PM, Philippe Gerum wrote:
> On 10/10/2012 02:41 PM, Philippe Gerum wrote:
>> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>>
>>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>>
>>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>>
>>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>>
>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>>>>>>> thread.
>>>>>>>>>>>>
>>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>>>>> talking about.
>>>>>>>>>>>
>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>>>>>>> of the block.
>>>>>>>>>>
>>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>>
>>>>>>>>> Err... absolutely right.
>>>>>>>>>
>>>>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>>>> first patch is fine.
>>>>>>>>
>>>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>>
>>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>>
>>>>>>> But then we really need something as light-weight as preempt_enable/disable.
>>>>>>>
>>>>>> This is not as lightweight as it might be given that we pair a flag and
>>>>>> a counter to achieve this (which saves one data reference in
>>>>>> xnpod_schedule() though), but this is a start:
>>>>>>
>>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>>
>>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>>
>>>>>
>>>>> The problem of xnpod_unlock_sched from my point of view is this section
>>>>> of code:
>>>>>
>>>>>  		xnsched_set_self_resched(curr->sched);
>>>>>  		xnpod_schedule();
>>>>>
>>>>> It means that we will go to the full blown __xnpod_schedule when
>>>>> unlocking the top-most spinlock.
>>>>>
>>>>> I guess what I meant is that we should have a scheduler bit that is
>>>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>>
>>>>>
>>>>
>>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
>>>
>>> Yes, but the reason why we have to go to __xnpod_schedule in
>>> xnpod_unlock_sched() is to allow thread to be suspended with the
>>> scheduler locked (you added that at some point for the vxworks emulator,
>>> if I remember correctly). But without that need, we can put the XNLOCK
>>> bit in the sched->status structure, and simply test that bit with all
>>> the others in xnpod_schedule(), and forget about the need to call
>>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>>>
>>
>> You may have to reschedule after an IRQ has interrupted the lock owner
>> inside the locked section, in which case you have to make sure to
>> reschedule when dropping the sched lock.
>>
>> This change was not
> 
> s/was not/was/
> 
>  directly related to holding the scheduler lock
>> across rescheduling points. The motivation was to plug a caveat
>> preventing fully non-preemptible sections to be built with a
>> rescheduling point in the middle: sometimes the app logic really wants
>> all operations to run un-preempted before the caller switches out.

The other reason was to allow the locking scheme to survive to domain
migration.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 12:41                                           ` Philippe Gerum
  2012-10-10 12:43                                             ` Philippe Gerum
@ 2012-10-10 12:55                                             ` Gilles Chanteperdrix
  2012-10-10 12:57                                               ` Philippe Gerum
  1 sibling, 1 reply; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-10 12:55 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 02:41 PM, Philippe Gerum wrote:
> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>
>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>>
>>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>
>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>
>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>
>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>
>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>>>>>> thread.
>>>>>>>>>>>
>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>>>> talking about.
>>>>>>>>>>
>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>>>>>> of the block.
>>>>>>>>>
>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>
>>>>>>>> Err... absolutely right.
>>>>>>>>
>>>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>>> first patch is fine.
>>>>>>>
>>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>
>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>
>>>>>> But then we really need something as light-weight as preempt_enable/disable.
>>>>>>
>>>>> This is not as lightweight as it might be given that we pair a flag and
>>>>> a counter to achieve this (which saves one data reference in
>>>>> xnpod_schedule() though), but this is a start:
>>>>>
>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>
>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>
>>>>
>>>> The problem of xnpod_unlock_sched from my point of view is this section
>>>> of code:
>>>>
>>>>  		xnsched_set_self_resched(curr->sched);
>>>>  		xnpod_schedule();
>>>>
>>>> It means that we will go to the full blown __xnpod_schedule when
>>>> unlocking the top-most spinlock.
>>>>
>>>> I guess what I meant is that we should have a scheduler bit that is
>>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>
>>>>
>>>
>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
>>
>> Yes, but the reason why we have to go to __xnpod_schedule in
>> xnpod_unlock_sched() is to allow thread to be suspended with the
>> scheduler locked (you added that at some point for the vxworks emulator,
>> if I remember correctly). But without that need, we can put the XNLOCK
>> bit in the sched->status structure, and simply test that bit with all
>> the others in xnpod_schedule(), and forget about the need to call
>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>>
> 
> You may have to reschedule after an IRQ has interrupted the lock owner
> inside the locked section, in which case you have to make sure to
> reschedule when dropping the sched lock.

Well, in that case the resched bit will have been set already by the irq
handler calling xnpod_resume_thread, or other service, you do not have
to force it.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 12:55                                             ` Gilles Chanteperdrix
@ 2012-10-10 12:57                                               ` Philippe Gerum
  2012-10-10 13:01                                                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 42+ messages in thread
From: Philippe Gerum @ 2012-10-10 12:57 UTC (permalink / raw)
  To: Gilles Chanteperdrix
  Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 02:55 PM, Gilles Chanteperdrix wrote:
> On 10/10/2012 02:41 PM, Philippe Gerum wrote:
>> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>>
>>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>>
>>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>>
>>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>>
>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>>>>>>> thread.
>>>>>>>>>>>>
>>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>>>>> talking about.
>>>>>>>>>>>
>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>>>>>>> of the block.
>>>>>>>>>>
>>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>>
>>>>>>>>> Err... absolutely right.
>>>>>>>>>
>>>>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>>>> first patch is fine.
>>>>>>>>
>>>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>>
>>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>>
>>>>>>> But then we really need something as light-weight as preempt_enable/disable.
>>>>>>>
>>>>>> This is not as lightweight as it might be given that we pair a flag and
>>>>>> a counter to achieve this (which saves one data reference in
>>>>>> xnpod_schedule() though), but this is a start:
>>>>>>
>>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>>
>>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>>
>>>>>
>>>>> The problem of xnpod_unlock_sched from my point of view is this section
>>>>> of code:
>>>>>
>>>>>  		xnsched_set_self_resched(curr->sched);
>>>>>  		xnpod_schedule();
>>>>>
>>>>> It means that we will go to the full blown __xnpod_schedule when
>>>>> unlocking the top-most spinlock.
>>>>>
>>>>> I guess what I meant is that we should have a scheduler bit that is
>>>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>>
>>>>>
>>>>
>>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
>>>
>>> Yes, but the reason why we have to go to __xnpod_schedule in
>>> xnpod_unlock_sched() is to allow thread to be suspended with the
>>> scheduler locked (you added that at some point for the vxworks emulator,
>>> if I remember correctly). But without that need, we can put the XNLOCK
>>> bit in the sched->status structure, and simply test that bit with all
>>> the others in xnpod_schedule(), and forget about the need to call
>>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>>>
>>
>> You may have to reschedule after an IRQ has interrupted the lock owner
>> inside the locked section, in which case you have to make sure to
>> reschedule when dropping the sched lock.
> 
> Well, in that case the resched bit will have been set already by the irq
> handler calling xnpod_resume_thread, or other service, you do not have
> to force it.
> 

Yes, and this is why you have to call __xnpod_schedule() regardless.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 12:57                                               ` Philippe Gerum
@ 2012-10-10 13:01                                                 ` Gilles Chanteperdrix
  2012-10-10 13:09                                                   ` Philippe Gerum
  0 siblings, 1 reply; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-10 13:01 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 02:57 PM, Philippe Gerum wrote:
> On 10/10/2012 02:55 PM, Gilles Chanteperdrix wrote:
>> On 10/10/2012 02:41 PM, Philippe Gerum wrote:
>>> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>>>
>>>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>>>
>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>>>>>> talking about.
>>>>>>>>>>>>
>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>>>>>>>> of the block.
>>>>>>>>>>>
>>>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>>>
>>>>>>>>>> Err... absolutely right.
>>>>>>>>>>
>>>>>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>>>>> first patch is fine.
>>>>>>>>>
>>>>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>>>
>>>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>>>
>>>>>>>> But then we really need something as light-weight as preempt_enable/disable.
>>>>>>>>
>>>>>>> This is not as lightweight as it might be given that we pair a flag and
>>>>>>> a counter to achieve this (which saves one data reference in
>>>>>>> xnpod_schedule() though), but this is a start:
>>>>>>>
>>>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>>>
>>>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>>>
>>>>>>
>>>>>> The problem of xnpod_unlock_sched from my point of view is this section
>>>>>> of code:
>>>>>>
>>>>>>  		xnsched_set_self_resched(curr->sched);
>>>>>>  		xnpod_schedule();
>>>>>>
>>>>>> It means that we will go to the full blown __xnpod_schedule when
>>>>>> unlocking the top-most spinlock.
>>>>>>
>>>>>> I guess what I meant is that we should have a scheduler bit that is
>>>>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>>>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>>>
>>>>>>
>>>>>
>>>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
>>>>
>>>> Yes, but the reason why we have to go to __xnpod_schedule in
>>>> xnpod_unlock_sched() is to allow thread to be suspended with the
>>>> scheduler locked (you added that at some point for the vxworks emulator,
>>>> if I remember correctly). But without that need, we can put the XNLOCK
>>>> bit in the sched->status structure, and simply test that bit with all
>>>> the others in xnpod_schedule(), and forget about the need to call
>>>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>>>>
>>>
>>> You may have to reschedule after an IRQ has interrupted the lock owner
>>> inside the locked section, in which case you have to make sure to
>>> reschedule when dropping the sched lock.
>>
>> Well, in that case the resched bit will have been set already by the irq
>> handler calling xnpod_resume_thread, or other service, you do not have
>> to force it.
>>
> 
> Yes, and this is why you have to call __xnpod_schedule() regardless.
> 
no, xnpod_schedule is enough, it will only call __xnpod_schedule() if
the resched bit is set.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 13:01                                                 ` Gilles Chanteperdrix
@ 2012-10-10 13:09                                                   ` Philippe Gerum
  2012-10-10 13:16                                                     ` Gilles Chanteperdrix
  2012-10-10 13:17                                                     ` Philippe Gerum
  0 siblings, 2 replies; 42+ messages in thread
From: Philippe Gerum @ 2012-10-10 13:09 UTC (permalink / raw)
  To: Gilles Chanteperdrix
  Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 03:01 PM, Gilles Chanteperdrix wrote:
> On 10/10/2012 02:57 PM, Philippe Gerum wrote:
>> On 10/10/2012 02:55 PM, Gilles Chanteperdrix wrote:
>>> On 10/10/2012 02:41 PM, Philippe Gerum wrote:
>>>> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>>>>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>>>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>>>>
>>>>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>>>>>>> talking about.
>>>>>>>>>>>>>
>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>>>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>>>>>>>>> of the block.
>>>>>>>>>>>>
>>>>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>>>>
>>>>>>>>>>> Err... absolutely right.
>>>>>>>>>>>
>>>>>>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>>>>>> first patch is fine.
>>>>>>>>>>
>>>>>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>>>>
>>>>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>>>>
>>>>>>>>> But then we really need something as light-weight as preempt_enable/disable.
>>>>>>>>>
>>>>>>>> This is not as lightweight as it might be given that we pair a flag and
>>>>>>>> a counter to achieve this (which saves one data reference in
>>>>>>>> xnpod_schedule() though), but this is a start:
>>>>>>>>
>>>>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>>>>
>>>>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>>>>
>>>>>>>
>>>>>>> The problem of xnpod_unlock_sched from my point of view is this section
>>>>>>> of code:
>>>>>>>
>>>>>>>  		xnsched_set_self_resched(curr->sched);
>>>>>>>  		xnpod_schedule();
>>>>>>>
>>>>>>> It means that we will go to the full blown __xnpod_schedule when
>>>>>>> unlocking the top-most spinlock.
>>>>>>>
>>>>>>> I guess what I meant is that we should have a scheduler bit that is
>>>>>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>>>>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>>>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
>>>>>
>>>>> Yes, but the reason why we have to go to __xnpod_schedule in
>>>>> xnpod_unlock_sched() is to allow thread to be suspended with the
>>>>> scheduler locked (you added that at some point for the vxworks emulator,
>>>>> if I remember correctly). But without that need, we can put the XNLOCK
>>>>> bit in the sched->status structure, and simply test that bit with all
>>>>> the others in xnpod_schedule(), and forget about the need to call
>>>>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>>>>>
>>>>
>>>> You may have to reschedule after an IRQ has interrupted the lock owner
>>>> inside the locked section, in which case you have to make sure to
>>>> reschedule when dropping the sched lock.
>>>
>>> Well, in that case the resched bit will have been set already by the irq
>>> handler calling xnpod_resume_thread, or other service, you do not have
>>> to force it.
>>>
>>
>> Yes, and this is why you have to call __xnpod_schedule() regardless.
>>
> no, xnpod_schedule is enough, it will only call __xnpod_schedule() if
> the resched bit is set.
> 

We are not discussing about the same issue, I'm afraid. We can't
optimize the schedlock path via a scheduler flag since we have to care
about lock nesting. Since the only sane option there is to hold such
counter in the current thread context, the optimization is 100% in the
way we access this information.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 13:09                                                   ` Philippe Gerum
@ 2012-10-10 13:16                                                     ` Gilles Chanteperdrix
  2012-10-10 13:18                                                       ` Philippe Gerum
  2012-10-10 13:17                                                     ` Philippe Gerum
  1 sibling, 1 reply; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-10 13:16 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 03:09 PM, Philippe Gerum wrote:
> On 10/10/2012 03:01 PM, Gilles Chanteperdrix wrote:
>> On 10/10/2012 02:57 PM, Philippe Gerum wrote:
>>> On 10/10/2012 02:55 PM, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 02:41 PM, Philippe Gerum wrote:
>>>>> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>>>>>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>>>>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>>>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>>>>>>>> talking about.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>>>>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>>>>>>>>>> of the block.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>>>>>
>>>>>>>>>>>> Err... absolutely right.
>>>>>>>>>>>>
>>>>>>>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>>>>>>> first patch is fine.
>>>>>>>>>>>
>>>>>>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>>>>>
>>>>>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>>>>>
>>>>>>>>>> But then we really need something as light-weight as preempt_enable/disable.
>>>>>>>>>>
>>>>>>>>> This is not as lightweight as it might be given that we pair a flag and
>>>>>>>>> a counter to achieve this (which saves one data reference in
>>>>>>>>> xnpod_schedule() though), but this is a start:
>>>>>>>>>
>>>>>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>>>>>
>>>>>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>>>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>>>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>>>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>>>>>
>>>>>>>>
>>>>>>>> The problem of xnpod_unlock_sched from my point of view is this section
>>>>>>>> of code:
>>>>>>>>
>>>>>>>>  		xnsched_set_self_resched(curr->sched);
>>>>>>>>  		xnpod_schedule();
>>>>>>>>
>>>>>>>> It means that we will go to the full blown __xnpod_schedule when
>>>>>>>> unlocking the top-most spinlock.
>>>>>>>>
>>>>>>>> I guess what I meant is that we should have a scheduler bit that is
>>>>>>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>>>>>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>>>>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>>>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
>>>>>>
>>>>>> Yes, but the reason why we have to go to __xnpod_schedule in
>>>>>> xnpod_unlock_sched() is to allow thread to be suspended with the
>>>>>> scheduler locked (you added that at some point for the vxworks emulator,
>>>>>> if I remember correctly). But without that need, we can put the XNLOCK
>>>>>> bit in the sched->status structure, and simply test that bit with all
>>>>>> the others in xnpod_schedule(), and forget about the need to call
>>>>>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>>>>>>
>>>>>
>>>>> You may have to reschedule after an IRQ has interrupted the lock owner
>>>>> inside the locked section, in which case you have to make sure to
>>>>> reschedule when dropping the sched lock.
>>>>
>>>> Well, in that case the resched bit will have been set already by the irq
>>>> handler calling xnpod_resume_thread, or other service, you do not have
>>>> to force it.
>>>>
>>>
>>> Yes, and this is why you have to call __xnpod_schedule() regardless.
>>>
>> no, xnpod_schedule is enough, it will only call __xnpod_schedule() if
>> the resched bit is set.
>>
> 
> We are not discussing about the same issue, I'm afraid. We can't
> optimize the schedlock path via a scheduler flag since we have to care
> about lock nesting. Since the only sane option there is to hold such
> counter in the current thread context, the optimization is 100% in the
> way we access this information.
> 

Again, if we do not have to allow threads to suspend while holding the
scheduler lock, the preempt_count also can be in the sched structure.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 13:09                                                   ` Philippe Gerum
  2012-10-10 13:16                                                     ` Gilles Chanteperdrix
@ 2012-10-10 13:17                                                     ` Philippe Gerum
  1 sibling, 0 replies; 42+ messages in thread
From: Philippe Gerum @ 2012-10-10 13:17 UTC (permalink / raw)
  To: Gilles Chanteperdrix
  Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 03:09 PM, Philippe Gerum wrote:
> On 10/10/2012 03:01 PM, Gilles Chanteperdrix wrote:
>> On 10/10/2012 02:57 PM, Philippe Gerum wrote:
>>> On 10/10/2012 02:55 PM, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 02:41 PM, Philippe Gerum wrote:
>>>>> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>>>>>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>>>>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>>>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>>>>>
>>>>>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>>>>>>>> talking about.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>>>>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>>>>>>>>>> of the block.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>>>>>
>>>>>>>>>>>> Err... absolutely right.
>>>>>>>>>>>>
>>>>>>>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>>>>>>> first patch is fine.
>>>>>>>>>>>
>>>>>>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>>>>>
>>>>>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>>>>>
>>>>>>>>>> But then we really need something as light-weight as preempt_enable/disable.
>>>>>>>>>>
>>>>>>>>> This is not as lightweight as it might be given that we pair a flag and
>>>>>>>>> a counter to achieve this (which saves one data reference in
>>>>>>>>> xnpod_schedule() though), but this is a start:
>>>>>>>>>
>>>>>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>>>>>
>>>>>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>>>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>>>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>>>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>>>>>
>>>>>>>>
>>>>>>>> The problem of xnpod_unlock_sched from my point of view is this section
>>>>>>>> of code:
>>>>>>>>
>>>>>>>>  		xnsched_set_self_resched(curr->sched);
>>>>>>>>  		xnpod_schedule();
>>>>>>>>
>>>>>>>> It means that we will go to the full blown __xnpod_schedule when
>>>>>>>> unlocking the top-most spinlock.
>>>>>>>>
>>>>>>>> I guess what I meant is that we should have a scheduler bit that is
>>>>>>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>>>>>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>>>>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>>>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
>>>>>>
>>>>>> Yes, but the reason why we have to go to __xnpod_schedule in
>>>>>> xnpod_unlock_sched() is to allow thread to be suspended with the
>>>>>> scheduler locked (you added that at some point for the vxworks emulator,
>>>>>> if I remember correctly). But without that need, we can put the XNLOCK
>>>>>> bit in the sched->status structure, and simply test that bit with all
>>>>>> the others in xnpod_schedule(), and forget about the need to call
>>>>>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>>>>>>
>>>>>
>>>>> You may have to reschedule after an IRQ has interrupted the lock owner
>>>>> inside the locked section, in which case you have to make sure to
>>>>> reschedule when dropping the sched lock.
>>>>
>>>> Well, in that case the resched bit will have been set already by the irq
>>>> handler calling xnpod_resume_thread, or other service, you do not have
>>>> to force it.
>>>>
>>>
>>> Yes, and this is why you have to call __xnpod_schedule() regardless.
>>>
>> no, xnpod_schedule is enough, it will only call __xnpod_schedule() if
>> the resched bit is set.
>>
> 
> We are not discussing about the same issue, I'm afraid. We can't
> optimize the schedlock path via a scheduler flag since we have to care
> about lock nesting. Since the only sane option there is to hold such
> counter in the current thread context, the optimization is 100% in the
> way we access this information.
> 

And yes, I do understand that the ability for a thread to schedule out
while holding the scheduler lock is part of the problem, but this
ability is key to have a consistent scheduler locking scheme.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 13:16                                                     ` Gilles Chanteperdrix
@ 2012-10-10 13:18                                                       ` Philippe Gerum
  2012-10-10 13:19                                                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 42+ messages in thread
From: Philippe Gerum @ 2012-10-10 13:18 UTC (permalink / raw)
  To: Gilles Chanteperdrix
  Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 03:16 PM, Gilles Chanteperdrix wrote:
> On 10/10/2012 03:09 PM, Philippe Gerum wrote:
>> On 10/10/2012 03:01 PM, Gilles Chanteperdrix wrote:
>>> On 10/10/2012 02:57 PM, Philippe Gerum wrote:
>>>> On 10/10/2012 02:55 PM, Gilles Chanteperdrix wrote:
>>>>> On 10/10/2012 02:41 PM, Philippe Gerum wrote:
>>>>>> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>>>>>>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>>>>>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>>>>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>>>>>>
>>>>>>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>>>>>>>>> talking about.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>>>>>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>>>>>>>>>>> of the block.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Err... absolutely right.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>>>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>>>>>>>> first patch is fine.
>>>>>>>>>>>>
>>>>>>>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>>>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>>>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>>>>>>
>>>>>>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>>>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>>>>>>
>>>>>>>>>>> But then we really need something as light-weight as preempt_enable/disable.
>>>>>>>>>>>
>>>>>>>>>> This is not as lightweight as it might be given that we pair a flag and
>>>>>>>>>> a counter to achieve this (which saves one data reference in
>>>>>>>>>> xnpod_schedule() though), but this is a start:
>>>>>>>>>>
>>>>>>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>>>>>>
>>>>>>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>>>>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>>>>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>>>>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The problem of xnpod_unlock_sched from my point of view is this section
>>>>>>>>> of code:
>>>>>>>>>
>>>>>>>>>  		xnsched_set_self_resched(curr->sched);
>>>>>>>>>  		xnpod_schedule();
>>>>>>>>>
>>>>>>>>> It means that we will go to the full blown __xnpod_schedule when
>>>>>>>>> unlocking the top-most spinlock.
>>>>>>>>>
>>>>>>>>> I guess what I meant is that we should have a scheduler bit that is
>>>>>>>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>>>>>>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>>>>>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>>>>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
>>>>>>>
>>>>>>> Yes, but the reason why we have to go to __xnpod_schedule in
>>>>>>> xnpod_unlock_sched() is to allow thread to be suspended with the
>>>>>>> scheduler locked (you added that at some point for the vxworks emulator,
>>>>>>> if I remember correctly). But without that need, we can put the XNLOCK
>>>>>>> bit in the sched->status structure, and simply test that bit with all
>>>>>>> the others in xnpod_schedule(), and forget about the need to call
>>>>>>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>>>>>>>
>>>>>>
>>>>>> You may have to reschedule after an IRQ has interrupted the lock owner
>>>>>> inside the locked section, in which case you have to make sure to
>>>>>> reschedule when dropping the sched lock.
>>>>>
>>>>> Well, in that case the resched bit will have been set already by the irq
>>>>> handler calling xnpod_resume_thread, or other service, you do not have
>>>>> to force it.
>>>>>
>>>>
>>>> Yes, and this is why you have to call __xnpod_schedule() regardless.
>>>>
>>> no, xnpod_schedule is enough, it will only call __xnpod_schedule() if
>>> the resched bit is set.
>>>
>>
>> We are not discussing about the same issue, I'm afraid. We can't
>> optimize the schedlock path via a scheduler flag since we have to care
>> about lock nesting. Since the only sane option there is to hold such
>> counter in the current thread context, the optimization is 100% in the
>> way we access this information.
>>
> 
> Again, if we do not have to allow threads to suspend while holding the
> scheduler lock, the preempt_count also can be in the sched structure.
> 

Sure, but this point is moot. We just _can't_ do that.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 13:18                                                       ` Philippe Gerum
@ 2012-10-10 13:19                                                         ` Gilles Chanteperdrix
  2012-10-10 13:57                                                           ` Philippe Gerum
  0 siblings, 1 reply; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-10 13:19 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 03:18 PM, Philippe Gerum wrote:
> On 10/10/2012 03:16 PM, Gilles Chanteperdrix wrote:
>> On 10/10/2012 03:09 PM, Philippe Gerum wrote:
>>> On 10/10/2012 03:01 PM, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 02:57 PM, Philippe Gerum wrote:
>>>>> On 10/10/2012 02:55 PM, Gilles Chanteperdrix wrote:
>>>>>> On 10/10/2012 02:41 PM, Philippe Gerum wrote:
>>>>>>> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>>>>>>>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>>>>>>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>>>>>>>
>>>>>>>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>>>>>>>>>> talking about.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>>>>>>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>>>>>>>>>>>> of the block.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Err... absolutely right.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>>>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>>>>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>>>>>>>>> first patch is fine.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>>>>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>>>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>>>>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>>>>>>>
>>>>>>>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>>>>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>>>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>>>>>>>
>>>>>>>>>>>> But then we really need something as light-weight as preempt_enable/disable.
>>>>>>>>>>>>
>>>>>>>>>>> This is not as lightweight as it might be given that we pair a flag and
>>>>>>>>>>> a counter to achieve this (which saves one data reference in
>>>>>>>>>>> xnpod_schedule() though), but this is a start:
>>>>>>>>>>>
>>>>>>>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>>>>>>>
>>>>>>>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>>>>>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>>>>>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>>>>>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The problem of xnpod_unlock_sched from my point of view is this section
>>>>>>>>>> of code:
>>>>>>>>>>
>>>>>>>>>>  		xnsched_set_self_resched(curr->sched);
>>>>>>>>>>  		xnpod_schedule();
>>>>>>>>>>
>>>>>>>>>> It means that we will go to the full blown __xnpod_schedule when
>>>>>>>>>> unlocking the top-most spinlock.
>>>>>>>>>>
>>>>>>>>>> I guess what I meant is that we should have a scheduler bit that is
>>>>>>>>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>>>>>>>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>>>>>>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>>>>>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
>>>>>>>>
>>>>>>>> Yes, but the reason why we have to go to __xnpod_schedule in
>>>>>>>> xnpod_unlock_sched() is to allow thread to be suspended with the
>>>>>>>> scheduler locked (you added that at some point for the vxworks emulator,
>>>>>>>> if I remember correctly). But without that need, we can put the XNLOCK
>>>>>>>> bit in the sched->status structure, and simply test that bit with all
>>>>>>>> the others in xnpod_schedule(), and forget about the need to call
>>>>>>>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>>>>>>>>
>>>>>>>
>>>>>>> You may have to reschedule after an IRQ has interrupted the lock owner
>>>>>>> inside the locked section, in which case you have to make sure to
>>>>>>> reschedule when dropping the sched lock.
>>>>>>
>>>>>> Well, in that case the resched bit will have been set already by the irq
>>>>>> handler calling xnpod_resume_thread, or other service, you do not have
>>>>>> to force it.
>>>>>>
>>>>>
>>>>> Yes, and this is why you have to call __xnpod_schedule() regardless.
>>>>>
>>>> no, xnpod_schedule is enough, it will only call __xnpod_schedule() if
>>>> the resched bit is set.
>>>>
>>>
>>> We are not discussing about the same issue, I'm afraid. We can't
>>> optimize the schedlock path via a scheduler flag since we have to care
>>> about lock nesting. Since the only sane option there is to hold such
>>> counter in the current thread context, the optimization is 100% in the
>>> way we access this information.
>>>
>>
>> Again, if we do not have to allow threads to suspend while holding the
>> scheduler lock, the preempt_count also can be in the sched structure.
>>
> 
> Sure, but this point is moot. We just _can't_ do that.
> 
Except if we invent another scheduler lock service, simply for the
purpose of spinlocks where suspending the spinlock holder does not make
sense anyway.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 13:19                                                         ` Gilles Chanteperdrix
@ 2012-10-10 13:57                                                           ` Philippe Gerum
  2012-10-10 14:52                                                             ` Gilles Chanteperdrix
  2012-10-11  8:05                                                             ` Gilles Chanteperdrix
  0 siblings, 2 replies; 42+ messages in thread
From: Philippe Gerum @ 2012-10-10 13:57 UTC (permalink / raw)
  To: Gilles Chanteperdrix
  Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 03:19 PM, Gilles Chanteperdrix wrote:
> On 10/10/2012 03:18 PM, Philippe Gerum wrote:
>> On 10/10/2012 03:16 PM, Gilles Chanteperdrix wrote:
>>> On 10/10/2012 03:09 PM, Philippe Gerum wrote:
>>>> On 10/10/2012 03:01 PM, Gilles Chanteperdrix wrote:
>>>>> On 10/10/2012 02:57 PM, Philippe Gerum wrote:
>>>>>> On 10/10/2012 02:55 PM, Gilles Chanteperdrix wrote:
>>>>>>> On 10/10/2012 02:41 PM, Philippe Gerum wrote:
>>>>>>>> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>>>>>>>>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>>>>>>>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>>>>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>>>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>>>>>>>>>>> talking about.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>>>>>>>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>>>>>>>>>>>>> of the block.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Err... absolutely right.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>>>>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>>>>>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>>>>>>>>>> first patch is fine.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>>>>>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>>>>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>>>>>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>>>>>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>>>>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But then we really need something as light-weight as preempt_enable/disable.
>>>>>>>>>>>>>
>>>>>>>>>>>> This is not as lightweight as it might be given that we pair a flag and
>>>>>>>>>>>> a counter to achieve this (which saves one data reference in
>>>>>>>>>>>> xnpod_schedule() though), but this is a start:
>>>>>>>>>>>>
>>>>>>>>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>>>>>>>>
>>>>>>>>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>>>>>>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>>>>>>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>>>>>>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The problem of xnpod_unlock_sched from my point of view is this section
>>>>>>>>>>> of code:
>>>>>>>>>>>
>>>>>>>>>>>  		xnsched_set_self_resched(curr->sched);
>>>>>>>>>>>  		xnpod_schedule();
>>>>>>>>>>>
>>>>>>>>>>> It means that we will go to the full blown __xnpod_schedule when
>>>>>>>>>>> unlocking the top-most spinlock.
>>>>>>>>>>>
>>>>>>>>>>> I guess what I meant is that we should have a scheduler bit that is
>>>>>>>>>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>>>>>>>>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>>>>>>>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>>>>>>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
>>>>>>>>>
>>>>>>>>> Yes, but the reason why we have to go to __xnpod_schedule in
>>>>>>>>> xnpod_unlock_sched() is to allow thread to be suspended with the
>>>>>>>>> scheduler locked (you added that at some point for the vxworks emulator,
>>>>>>>>> if I remember correctly). But without that need, we can put the XNLOCK
>>>>>>>>> bit in the sched->status structure, and simply test that bit with all
>>>>>>>>> the others in xnpod_schedule(), and forget about the need to call
>>>>>>>>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>>>>>>>>>
>>>>>>>>
>>>>>>>> You may have to reschedule after an IRQ has interrupted the lock owner
>>>>>>>> inside the locked section, in which case you have to make sure to
>>>>>>>> reschedule when dropping the sched lock.
>>>>>>>
>>>>>>> Well, in that case the resched bit will have been set already by the irq
>>>>>>> handler calling xnpod_resume_thread, or other service, you do not have
>>>>>>> to force it.
>>>>>>>
>>>>>>
>>>>>> Yes, and this is why you have to call __xnpod_schedule() regardless.
>>>>>>
>>>>> no, xnpod_schedule is enough, it will only call __xnpod_schedule() if
>>>>> the resched bit is set.
>>>>>
>>>>
>>>> We are not discussing about the same issue, I'm afraid. We can't
>>>> optimize the schedlock path via a scheduler flag since we have to care
>>>> about lock nesting. Since the only sane option there is to hold such
>>>> counter in the current thread context, the optimization is 100% in the
>>>> way we access this information.
>>>>
>>>
>>> Again, if we do not have to allow threads to suspend while holding the
>>> scheduler lock, the preempt_count also can be in the sched structure.
>>>
>>
>> Sure, but this point is moot. We just _can't_ do that.
>>
> Except if we invent another scheduler lock service, simply for the
> purpose of spinlocks where suspending the spinlock holder does not make
> sense anyway.
> 

Oh, well, ok. Ack.

Let's just make sure that we can fold the whole thing into a single set
of services at some point in 3.x though.

The core issue is that we should not even have to expose a scheduler
locking service to userland, but emulating traditional RTOS APIs
requires that. What a braindamage counter-feature for a real-time system
when one thinks of it.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 13:57                                                           ` Philippe Gerum
@ 2012-10-10 14:52                                                             ` Gilles Chanteperdrix
  2012-10-11  8:05                                                             ` Gilles Chanteperdrix
  1 sibling, 0 replies; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-10 14:52 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 03:57 PM, Philippe Gerum wrote:
> On 10/10/2012 03:19 PM, Gilles Chanteperdrix wrote:
>> On 10/10/2012 03:18 PM, Philippe Gerum wrote:
>>> On 10/10/2012 03:16 PM, Gilles Chanteperdrix wrote:
>>>> On 10/10/2012 03:09 PM, Philippe Gerum wrote:
>>>>> On 10/10/2012 03:01 PM, Gilles Chanteperdrix wrote:
>>>>>> On 10/10/2012 02:57 PM, Philippe Gerum wrote:
>>>>>>> On 10/10/2012 02:55 PM, Gilles Chanteperdrix wrote:
>>>>>>>> On 10/10/2012 02:41 PM, Philippe Gerum wrote:
>>>>>>>>> On 10/10/2012 02:33 PM, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 10/10/2012 02:09 PM, Philippe Gerum wrote:
>>>>>>>>>>> On 10/10/2012 01:30 PM, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> On 10/10/2012 12:54 PM, Philippe Gerum wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/10/2012 12:25 PM, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2012-10-10 12:07, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On 10/10/2012 12:04 PM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>> On 2012-10-10 11:23, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>> On 10/10/2012 11:01 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>>> On 2012-10-10 10:58, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>>> On 10/10/2012 10:10 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>>>>> On 2012-10-10 10:04, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>>>>> On 10/10/2012 09:56 AM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>>>>>>>> On 2012-10-10 09:51, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>>>>>>>>>> On 10/10/2012 09:38 AM, Thierry Bultel wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>>>>>>>>>>>>>> The first patch does not work, the second does.
>>>>>>>>>>>>>>>>>>>>>>>> I think the reason for 1st patch why is that in rtcan_virt, we have
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_get(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>>>>> --->        rtcan_rcv(rx_dev, &skb);
>>>>>>>>>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_put(&rtcan_socket_lock);
>>>>>>>>>>>>>>>>>>>>>>>> rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx);
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> and rtcan_rcv->rtcan_rcv_deliver->rtdm_sem_up(&sock->recv_sem);
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> thus the same re-scheduling stuff with interrupts locked.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Are you not not afraid of side effects with the second patch, 
>>>>>>>>>>>>>>>>>>>>>>>> since you change the overall behaviour ?
>>>>>>>>>>>>>>>>>>>>>>>> Won't you prefer a only locally modified rtcan_virt ?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> We should ask Jan's opinion. In any case, if we adopt the second patch,
>>>>>>>>>>>>>>>>>>>>>>> we might want to try and reduce the overhead of xnpod_unlock_sched.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> We were signaling the semaphore while holding a spin lock? That's a
>>>>>>>>>>>>>>>>>>>>>> clear bug. Your patch is aligning rtcan to the pattern we are also using
>>>>>>>>>>>>>>>>>>>>>> in RTnet. We just need to make sure (haven't looked at the full context
>>>>>>>>>>>>>>>>>>>>>> yet) that sock remains valid even after dropping the lock(s).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The second patch idea was to lock the scheduler while spinlocks are
>>>>>>>>>>>>>>>>>>>>> held, so that posting a semaphore while holding a spin lock is no longer
>>>>>>>>>>>>>>>>>>>>> a bug.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Sounds a bit hacky,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Well, that is what the linux kernel does.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  but I think we have this pattern
>>>>>>>>>>>>>>>>>>>> (RTDM_EXECUTE_ATOMICALLY)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY is a bit of a misnomer, if you do:
>>>>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY(foo(); rtdm_sem_up(); bar());
>>>>>>>>>>>>>>>>>>> foo() and bar() are not executed atomically if sem_up wakes up another
>>>>>>>>>>>>>>>>>>> thread.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> So, I do not see how RTDM_EXECUTE_ATOMICALLY solves the issue we are
>>>>>>>>>>>>>>>>>>> talking about.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY holds the nucleus lock across the encapsulated
>>>>>>>>>>>>>>>>>> code, executing it atomically as rescheduling is postponed until the end
>>>>>>>>>>>>>>>>>> of the block.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Err... no. Absolutely not.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Err... absolutely right.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The good news is: we don't need to worry about such kind of locking. In
>>>>>>>>>>>>>>>> rtcan_raw_recvmsg, the socket is locked via the RTDM context as we are
>>>>>>>>>>>>>>>> in a handler. So it won't disappear when we drop the lock, and your
>>>>>>>>>>>>>>>> first patch is fine.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Which one? The first one does not seem to work because the rtdm locks
>>>>>>>>>>>>>>> seem to be nested. The second one would probably need to find a way to
>>>>>>>>>>>>>>> reduce the overhead of xnpod_unlock_sched(). What can be done, however,
>>>>>>>>>>>>>>> is adding a call to xnpod_lock_sched()/xnpod_unlock_sched() in
>>>>>>>>>>>>>>> RTDM_EXECUTE_ATOMICALLY.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Oh, I'm seeing the locking forest in rtcan now. I suppose rtcan wasn't
>>>>>>>>>>>>>> used much on SMP so far. That looks indeed unresolvable without a
>>>>>>>>>>>>>> semantical change to rtdm_lock/unlock.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But then we really need something as light-weight as preempt_enable/disable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> This is not as lightweight as it might be given that we pair a flag and
>>>>>>>>>>>>> a counter to achieve this (which saves one data reference in
>>>>>>>>>>>>> xnpod_schedule() though), but this is a start:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=aed4dfce9967e45ef7e8a8da4b6c90267ea81497
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, I'm setting __xnpod_lock_sched() and __xnpod_unlock_sched() in stone
>>>>>>>>>>>>> in the nucleus API to manipulate the sched locking counter from a
>>>>>>>>>>>>> context where the nucleus lock is already held, so that RTDM can rely on
>>>>>>>>>>>>> this for RTDM_EXECUTE_ATOMICALLY().
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The problem of xnpod_unlock_sched from my point of view is this section
>>>>>>>>>>>> of code:
>>>>>>>>>>>>
>>>>>>>>>>>>  		xnsched_set_self_resched(curr->sched);
>>>>>>>>>>>>  		xnpod_schedule();
>>>>>>>>>>>>
>>>>>>>>>>>> It means that we will go to the full blown __xnpod_schedule when
>>>>>>>>>>>> unlocking the top-most spinlock.
>>>>>>>>>>>>
>>>>>>>>>>>> I guess what I meant is that we should have a scheduler bit that is
>>>>>>>>>>>> simply tested in xnpod_schedule, but then we loose the ability for the
>>>>>>>>>>>> threads with the scheduler locked to suspend themselves. So, maybe we
>>>>>>>>>>>> should define another xnpod_lock/unlock pair, maybe something like
>>>>>>>>>>>> xnpod_preempt_disablle()/xnpod_preempt_enabled().
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> xnpod_lock_sched() really means xnpod_preempt_disable() in RTOS parlance.
>>>>>>>>>>
>>>>>>>>>> Yes, but the reason why we have to go to __xnpod_schedule in
>>>>>>>>>> xnpod_unlock_sched() is to allow thread to be suspended with the
>>>>>>>>>> scheduler locked (you added that at some point for the vxworks emulator,
>>>>>>>>>> if I remember correctly). But without that need, we can put the XNLOCK
>>>>>>>>>> bit in the sched->status structure, and simply test that bit with all
>>>>>>>>>> the others in xnpod_schedule(), and forget about the need to call
>>>>>>>>>> xnsched_set_self_resched() in xnpod_unlock_sched(). That is all I meant.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You may have to reschedule after an IRQ has interrupted the lock owner
>>>>>>>>> inside the locked section, in which case you have to make sure to
>>>>>>>>> reschedule when dropping the sched lock.
>>>>>>>>
>>>>>>>> Well, in that case the resched bit will have been set already by the irq
>>>>>>>> handler calling xnpod_resume_thread, or other service, you do not have
>>>>>>>> to force it.
>>>>>>>>
>>>>>>>
>>>>>>> Yes, and this is why you have to call __xnpod_schedule() regardless.
>>>>>>>
>>>>>> no, xnpod_schedule is enough, it will only call __xnpod_schedule() if
>>>>>> the resched bit is set.
>>>>>>
>>>>>
>>>>> We are not discussing about the same issue, I'm afraid. We can't
>>>>> optimize the schedlock path via a scheduler flag since we have to care
>>>>> about lock nesting. Since the only sane option there is to hold such
>>>>> counter in the current thread context, the optimization is 100% in the
>>>>> way we access this information.
>>>>>
>>>>
>>>> Again, if we do not have to allow threads to suspend while holding the
>>>> scheduler lock, the preempt_count also can be in the sched structure.
>>>>
>>>
>>> Sure, but this point is moot. We just _can't_ do that.
>>>
>> Except if we invent another scheduler lock service, simply for the
>> purpose of spinlocks where suspending the spinlock holder does not make
>> sense anyway.
>>
> 
> Oh, well, ok. Ack.
> 
> Let's just make sure that we can fold the whole thing into a single set
> of services at some point in 3.x though.
> 
> The core issue is that we should not even have to expose a scheduler
> locking service to userland, but emulating traditional RTOS APIs
> requires that. What a braindamage counter-feature for a real-time system
> when one thinks of it.
> 

With a parameter to xnpod_lock_sched(), but we would have to pass a
parameter to xnpod_unlock_sched() too...

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-10 13:57                                                           ` Philippe Gerum
  2012-10-10 14:52                                                             ` Gilles Chanteperdrix
@ 2012-10-11  8:05                                                             ` Gilles Chanteperdrix
  2012-11-05 13:33                                                               ` Thierry Bultel
  1 sibling, 1 reply; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-11  8:05 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 10/10/2012 03:57 PM, Philippe Gerum wrote:
>> Except if we invent another scheduler lock service, simply for the
>> purpose of spinlocks where suspending the spinlock holder does not make
>> sense anyway.
>>
> 
> Oh, well, ok. Ack.
> 
> Let's just make sure that we can fold the whole thing into a single set
> of services at some point in 3.x though.
> 
> The core issue is that we should not even have to expose a scheduler
> locking service to userland, but emulating traditional RTOS APIs
> requires that. What a braindamage counter-feature for a real-time system
> when one thinks of it.

I think we can have the cake and eat it. The idea is to follow the
approach in the patch I sent, merged with the one you commited, but
simply, in xnpod_suspend_thread, clear the XNHELDSPIN (misnomed if used
for that), and simply when we wake up, re-set the bit in the scheduler
if the thread preemption count is not 0.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-10-11  8:05                                                             ` Gilles Chanteperdrix
@ 2012-11-05 13:33                                                               ` Thierry Bultel
  2012-11-05 18:46                                                                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 42+ messages in thread
From: Thierry Bultel @ 2012-11-05 13:33 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, Wolfgang Grandegger, xenomai

Le 11/10/2012 10:05, Gilles Chanteperdrix a écrit :
> On 10/10/2012 03:57 PM, Philippe Gerum wrote:
>>> Except if we invent another scheduler lock service, simply for the
>>> purpose of spinlocks where suspending the spinlock holder does not make
>>> sense anyway.
>>>
>>
>> Oh, well, ok. Ack.
>>
>> Let's just make sure that we can fold the whole thing into a single set
>> of services at some point in 3.x though.
>>
>> The core issue is that we should not even have to expose a scheduler
>> locking service to userland, but emulating traditional RTOS APIs
>> requires that. What a braindamage counter-feature for a real-time system
>> when one thinks of it.
> 
> I think we can have the cake and eat it. The idea is to follow the
> approach in the patch I sent, merged with the one you commited, but
> simply, in xnpod_suspend_thread, clear the XNHELDSPIN (misnomed if used
> for that), and simply when we wake up, re-set the bit in the scheduler
> if the thread preemption count is not 0.
> 

Hi,
sorry to come on that thread again, but it is not clear to me if the
issue has been adressed or not.
I have noticed the "nucleus: introduce internal sched locking helpers"
commit, that seems related but I am not sure.
Sorry for the noise.

Thanks
Thierry



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-11-05 13:33                                                               ` Thierry Bultel
@ 2012-11-05 18:46                                                                 ` Gilles Chanteperdrix
  2012-11-06  7:22                                                                   ` Thierry Bultel
  2012-11-06 11:59                                                                   ` Philippe Gerum
  0 siblings, 2 replies; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-11-05 18:46 UTC (permalink / raw)
  To: Thierry Bultel; +Cc: Jan Kiszka, Wolfgang Grandegger, xenomai

On 11/05/2012 02:33 PM, Thierry Bultel wrote:

> Le 11/10/2012 10:05, Gilles Chanteperdrix a écrit :
>> On 10/10/2012 03:57 PM, Philippe Gerum wrote:
>>>> Except if we invent another scheduler lock service, simply for the
>>>> purpose of spinlocks where suspending the spinlock holder does not make
>>>> sense anyway.
>>>>
>>>
>>> Oh, well, ok. Ack.
>>>
>>> Let's just make sure that we can fold the whole thing into a single set
>>> of services at some point in 3.x though.
>>>
>>> The core issue is that we should not even have to expose a scheduler
>>> locking service to userland, but emulating traditional RTOS APIs
>>> requires that. What a braindamage counter-feature for a real-time system
>>> when one thinks of it.
>>
>> I think we can have the cake and eat it. The idea is to follow the
>> approach in the patch I sent, merged with the one you commited, but
>> simply, in xnpod_suspend_thread, clear the XNHELDSPIN (misnomed if used
>> for that), and simply when we wake up, re-set the bit in the scheduler
>> if the thread preemption count is not 0.
>>
> 
> Hi,
> sorry to come on that thread again, but it is not clear to me if the
> issue has been adressed or not.
> I have noticed the "nucleus: introduce internal sched locking helpers"
> commit, that seems related but I am not sure.
> Sorry for the noise.


I have posted a patch, and await Philippe's approval. The patch may have
some dubious side effects, it is here:

http://www.xenomai.org/pipermail/xenomai/2012-October/026523.html

-- 
                                                                Gilles.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-11-05 18:46                                                                 ` Gilles Chanteperdrix
@ 2012-11-06  7:22                                                                   ` Thierry Bultel
  2012-11-06 11:59                                                                   ` Philippe Gerum
  1 sibling, 0 replies; 42+ messages in thread
From: Thierry Bultel @ 2012-11-06  7:22 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, Wolfgang Grandegger, xenomai

Le 05/11/2012 19:46, Gilles Chanteperdrix a écrit :
> On 11/05/2012 02:33 PM, Thierry Bultel wrote:
>
>> Le 11/10/2012 10:05, Gilles Chanteperdrix a écrit :
>>> On 10/10/2012 03:57 PM, Philippe Gerum wrote:
>>>>> Except if we invent another scheduler lock service, simply for the
>>>>> purpose of spinlocks where suspending the spinlock holder does not make
>>>>> sense anyway.
>>>>>
>>>>
>>>> Oh, well, ok. Ack.
>>>>
>>>> Let's just make sure that we can fold the whole thing into a single set
>>>> of services at some point in 3.x though.
>>>>
>>>> The core issue is that we should not even have to expose a scheduler
>>>> locking service to userland, but emulating traditional RTOS APIs
>>>> requires that. What a braindamage counter-feature for a real-time system
>>>> when one thinks of it.
>>>
>>> I think we can have the cake and eat it. The idea is to follow the
>>> approach in the patch I sent, merged with the one you commited, but
>>> simply, in xnpod_suspend_thread, clear the XNHELDSPIN (misnomed if used
>>> for that), and simply when we wake up, re-set the bit in the scheduler
>>> if the thread preemption count is not 0.
>>>
>>
>> Hi,
>> sorry to come on that thread again, but it is not clear to me if the
>> issue has been adressed or not.
>> I have noticed the "nucleus: introduce internal sched locking helpers"
>> commit, that seems related but I am not sure.
>> Sorry for the noise.
>
>
> I have posted a patch, and await Philippe's approval. The patch may have
> some dubious side effects, it is here:
>
> http://www.xenomai.org/pipermail/xenomai/2012-October/026523.html
>

Thanks a lot,
I will also try it asap
Thierry


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-11-05 18:46                                                                 ` Gilles Chanteperdrix
  2012-11-06  7:22                                                                   ` Thierry Bultel
@ 2012-11-06 11:59                                                                   ` Philippe Gerum
  2012-11-06 14:49                                                                     ` Gilles Chanteperdrix
  1 sibling, 1 reply; 42+ messages in thread
From: Philippe Gerum @ 2012-11-06 11:59 UTC (permalink / raw)
  To: Gilles Chanteperdrix
  Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 11/05/2012 07:46 PM, Gilles Chanteperdrix wrote:
> On 11/05/2012 02:33 PM, Thierry Bultel wrote:
> 
>> Le 11/10/2012 10:05, Gilles Chanteperdrix a écrit :
>>> On 10/10/2012 03:57 PM, Philippe Gerum wrote:
>>>>> Except if we invent another scheduler lock service, simply for the
>>>>> purpose of spinlocks where suspending the spinlock holder does not make
>>>>> sense anyway.
>>>>>
>>>>
>>>> Oh, well, ok. Ack.
>>>>
>>>> Let's just make sure that we can fold the whole thing into a single set
>>>> of services at some point in 3.x though.
>>>>
>>>> The core issue is that we should not even have to expose a scheduler
>>>> locking service to userland, but emulating traditional RTOS APIs
>>>> requires that. What a braindamage counter-feature for a real-time system
>>>> when one thinks of it.
>>>
>>> I think we can have the cake and eat it. The idea is to follow the
>>> approach in the patch I sent, merged with the one you commited, but
>>> simply, in xnpod_suspend_thread, clear the XNHELDSPIN (misnomed if used
>>> for that), and simply when we wake up, re-set the bit in the scheduler
>>> if the thread preemption count is not 0.
>>>
>>
>> Hi,
>> sorry to come on that thread again, but it is not clear to me if the
>> issue has been adressed or not.
>> I have noticed the "nucleus: introduce internal sched locking helpers"
>> commit, that seems related but I am not sure.
>> Sorry for the noise.
> 
> 
> I have posted a patch, and await Philippe's approval. The patch may have
> some dubious side effects, it is here:
> 
> http://www.xenomai.org/pipermail/xenomai/2012-October/026523.html
> 

Yes, I'm late once again. The patch has some problems, so please hold
this for now.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Xenomai] Target frozen with rtcan_virt
  2012-11-06 11:59                                                                   ` Philippe Gerum
@ 2012-11-06 14:49                                                                     ` Gilles Chanteperdrix
  0 siblings, 0 replies; 42+ messages in thread
From: Gilles Chanteperdrix @ 2012-11-06 14:49 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Thierry Bultel, Jan Kiszka, Wolfgang Grandegger, xenomai

On 11/06/2012 12:59 PM, Philippe Gerum wrote:
> On 11/05/2012 07:46 PM, Gilles Chanteperdrix wrote:
>> On 11/05/2012 02:33 PM, Thierry Bultel wrote:
>>
>>> Le 11/10/2012 10:05, Gilles Chanteperdrix a écrit :
>>>> On 10/10/2012 03:57 PM, Philippe Gerum wrote:
>>>>>> Except if we invent another scheduler lock service, simply for the
>>>>>> purpose of spinlocks where suspending the spinlock holder does not make
>>>>>> sense anyway.
>>>>>>
>>>>>
>>>>> Oh, well, ok. Ack.
>>>>>
>>>>> Let's just make sure that we can fold the whole thing into a single set
>>>>> of services at some point in 3.x though.
>>>>>
>>>>> The core issue is that we should not even have to expose a scheduler
>>>>> locking service to userland, but emulating traditional RTOS APIs
>>>>> requires that. What a braindamage counter-feature for a real-time system
>>>>> when one thinks of it.
>>>>
>>>> I think we can have the cake and eat it. The idea is to follow the
>>>> approach in the patch I sent, merged with the one you commited, but
>>>> simply, in xnpod_suspend_thread, clear the XNHELDSPIN (misnomed if used
>>>> for that), and simply when we wake up, re-set the bit in the scheduler
>>>> if the thread preemption count is not 0.
>>>>
>>>
>>> Hi,
>>> sorry to come on that thread again, but it is not clear to me if the
>>> issue has been adressed or not.
>>> I have noticed the "nucleus: introduce internal sched locking helpers"
>>> commit, that seems related but I am not sure.
>>> Sorry for the noise.
>>
>>
>> I have posted a patch, and await Philippe's approval. The patch may have
>> some dubious side effects, it is here:
>>
>> http://www.xenomai.org/pipermail/xenomai/2012-October/026523.html
>>
> 
> Yes, I'm late once again. The patch has some problems, so please hold
> this for now.

I can remove the parts of the patches we already know are problematic,
and post a new patch if you wish.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2012-11-06 14:49 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <505EB99F.7050705@wanadoo.fr>
2012-09-23  7:42 ` [Xenomai] Target frozen with rtcan_virt Thierry Bultel
2012-09-23 13:23   ` Gilles Chanteperdrix
2012-09-23 14:57     ` Thierry Bultel
2012-09-23 14:59       ` Gilles Chanteperdrix
2012-10-09 12:45     ` Thierry Bultel
2012-10-09 21:34       ` Gilles Chanteperdrix
2012-10-09 21:41         ` Gilles Chanteperdrix
2012-10-09 21:51           ` Gilles Chanteperdrix
2012-10-10  7:38             ` Thierry Bultel
2012-10-10  7:51               ` Gilles Chanteperdrix
2012-10-10  7:56                 ` Jan Kiszka
2012-10-10  8:04                   ` Gilles Chanteperdrix
2012-10-10  8:10                     ` Jan Kiszka
2012-10-10  8:58                       ` Gilles Chanteperdrix
2012-10-10  9:01                         ` Jan Kiszka
2012-10-10  9:23                           ` Gilles Chanteperdrix
2012-10-10 10:04                             ` Jan Kiszka
2012-10-10 10:07                               ` Gilles Chanteperdrix
2012-10-10 10:25                                 ` Jan Kiszka
2012-10-10 10:54                                   ` Philippe Gerum
2012-10-10 11:30                                     ` Gilles Chanteperdrix
2012-10-10 12:09                                       ` Philippe Gerum
2012-10-10 12:33                                         ` Gilles Chanteperdrix
2012-10-10 12:41                                           ` Philippe Gerum
2012-10-10 12:43                                             ` Philippe Gerum
2012-10-10 12:45                                               ` Philippe Gerum
2012-10-10 12:55                                             ` Gilles Chanteperdrix
2012-10-10 12:57                                               ` Philippe Gerum
2012-10-10 13:01                                                 ` Gilles Chanteperdrix
2012-10-10 13:09                                                   ` Philippe Gerum
2012-10-10 13:16                                                     ` Gilles Chanteperdrix
2012-10-10 13:18                                                       ` Philippe Gerum
2012-10-10 13:19                                                         ` Gilles Chanteperdrix
2012-10-10 13:57                                                           ` Philippe Gerum
2012-10-10 14:52                                                             ` Gilles Chanteperdrix
2012-10-11  8:05                                                             ` Gilles Chanteperdrix
2012-11-05 13:33                                                               ` Thierry Bultel
2012-11-05 18:46                                                                 ` Gilles Chanteperdrix
2012-11-06  7:22                                                                   ` Thierry Bultel
2012-11-06 11:59                                                                   ` Philippe Gerum
2012-11-06 14:49                                                                     ` Gilles Chanteperdrix
2012-10-10 13:17                                                     ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.