All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-help] Strange deadlock.
@ 2007-02-09 19:35 Maksym Veremeyenko
  2007-02-09 20:54 ` Philippe Gerum
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Maksym Veremeyenko @ 2007-02-09 19:35 UTC (permalink / raw)
  To: Xenomai-help

Hi!

I found dead lock in my program and looking for reason, but i found 
something strange. Is possible mutex state looks like:
[root@domain.hid /]# cat /proc/xenomai/registry/native/mutexes/coll_VDCP_1
=locked by bus_monitor_proc[0] depth=1
+vdcp[0030]m
+bus_monitor_proc[0]

seems bus_monitor_proc[0] locked mutex and want to lock again but its 
blocked.  (i still looking for stack corruption point, but no success.) 
It have the same behavior on two configuration 2.6.17.4 + 2.2.4 and 
2.6.19.2 + 2.3.0 + 1.7-01
i checked with GDB - two task are wating for mutex.
All task perform switching to secondary mode.


Also have a questions:

1. On program start after previous crash i unable to create named mutex, 
because it [-EEXIST], after bind it's unable to operate with it because 
it locked by task from early died program, i have to restart computer. 
Any suggestions?

2. Does exist API function to detect mutex owner?



-- 
________________________________________
Maksym Veremeyenko



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Strange deadlock.
  2007-02-09 19:35 [Xenomai-help] Strange deadlock Maksym Veremeyenko
@ 2007-02-09 20:54 ` Philippe Gerum
  2007-02-09 21:39   ` Maksym Veremeyenko
  2007-02-09 21:14 ` Dmitry Adamushko
  2007-02-10  4:07 ` Maksym Veremeyenko
  2 siblings, 1 reply; 10+ messages in thread
From: Philippe Gerum @ 2007-02-09 20:54 UTC (permalink / raw)
  To: Maksym Veremeyenko; +Cc: Xenomai-help

On Fri, 2007-02-09 at 21:35 +0200, Maksym Veremeyenko wrote:
> Hi!
> 
> I found dead lock in my program and looking for reason, but i found 
> something strange. Is possible mutex state looks like:
> [root@domain.hid /]# cat /proc/xenomai/registry/native/mutexes/coll_VDCP_1
> =locked by bus_monitor_proc[0] depth=1
> +vdcp[0030]m
> +bus_monitor_proc[0]

This state is bugous.

> 
> seems bus_monitor_proc[0] locked mutex and want to lock again but its 
> blocked.  (i still looking for stack corruption point, but no success.) 
> It have the same behavior on two configuration 2.6.17.4 + 2.2.4 and 
> 2.6.19.2 + 2.3.0 + 1.7-01
> i checked with GDB - two task are wating for mutex.
> All task perform switching to secondary mode.

Could you strip your code to a bare minimum that still exhibits this bug
and post it here ? TIA,

> 
> 
> Also have a questions:
> 
> 1. On program start after previous crash i unable to create named mutex, 
> because it [-EEXIST], after bind it's unable to operate with it because 
> it locked by task from early died program, i have to restart computer. 
> Any suggestions?

Something like this would do:

if (rt_mutex_bind("...", &desc) == 0)
	rt_mutex_delete(&desc);

rt_mutex_create("...", &desc);

> 
> 2. Does exist API function to detect mutex owner?
> 

No, albeit rt_mutex_inquire() could be extended to return this info.

> 
> 
-- 
Philippe.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Strange deadlock.
  2007-02-09 19:35 [Xenomai-help] Strange deadlock Maksym Veremeyenko
  2007-02-09 20:54 ` Philippe Gerum
@ 2007-02-09 21:14 ` Dmitry Adamushko
  2007-02-09 22:10   ` Maksym Veremeyenko
  2007-02-10  4:07 ` Maksym Veremeyenko
  2 siblings, 1 reply; 10+ messages in thread
From: Dmitry Adamushko @ 2007-02-09 21:14 UTC (permalink / raw)
  To: Maksym Veremeyenko; +Cc: Xenomai help

> I found dead lock in my program and looking for reason, but i found
> something strange. Is possible mutex state looks like:
> [root@domain.hid /]# cat /proc/xenomai/registry/native/mutexes/coll_VDCP_1
> =locked by bus_monitor_proc[0] depth=1
> +vdcp[0030]m
> +bus_monitor_proc[0]

Just to put aside a few (quite) weird assumptions :

(1) did you have another task that has been using this mutex and has
(somehow) exited by the moment of lockup?

(2) bus_monitor_proc[0] is created once and never
exits_then_created_again while the program is running ?

TIA,

-- 
Best regards,
Dmitry Adamushko


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Strange deadlock.
  2007-02-09 20:54 ` Philippe Gerum
@ 2007-02-09 21:39   ` Maksym Veremeyenko
  0 siblings, 0 replies; 10+ messages in thread
From: Maksym Veremeyenko @ 2007-02-09 21:39 UTC (permalink / raw)
  To: Xenomai-help

Philippe Gerum пишет:

>> seems bus_monitor_proc[0] locked mutex and want to lock again but its 
>> blocked.  (i still looking for stack corruption point, but no success.) 
>> It have the same behavior on two configuration 2.6.17.4 + 2.2.4 and 
>> 2.6.19.2 + 2.3.0 + 1.7-01
>> i checked with GDB - two task are wating for mutex.
>> All task perform switching to secondary mode.
> 
> Could you strip your code to a bare minimum that still exhibits this bug
> and post it here ? TIA,
unfortunately i can reproduce this situation only on working model with 
serial controlled device in one of this threads.

>> 1. On program start after previous crash i unable to create named mutex, 
>> because it [-EEXIST], after bind it's unable to operate with it because 
>> it locked by task from early died program, i have to restart computer. 
>> Any suggestions?
> 
> Something like this would do:
> 
> if (rt_mutex_bind("...", &desc) == 0)
> 	rt_mutex_delete(&desc);
> 
> rt_mutex_create("...", &desc);
YES! this works fine for me.


________________________________________
Maksym Veremeyenko


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Strange deadlock.
  2007-02-09 21:14 ` Dmitry Adamushko
@ 2007-02-09 22:10   ` Maksym Veremeyenko
  2007-02-09 23:53     ` Dmitry Adamushko
  0 siblings, 1 reply; 10+ messages in thread
From: Maksym Veremeyenko @ 2007-02-09 22:10 UTC (permalink / raw)
  To: Xenomai help

Dmitry Adamushko пишет:

> Just to put aside a few (quite) weird assumptions :
> 
> (1) did you have another task that has been using this mutex and has
> (somehow) exited by the moment of lockup?
there are only 2 task that has access to that mutex.

> (2) bus_monitor_proc[0] is created once and never
> exits_then_created_again while the program is running ?
all task starting from begining of program and never restart.

i renamed mutex name twice to avoid name misspelled access and move 
throw the struct to avoid offset with another struct - nothing help.
i tried to increase stack size - not helped.

i try to move lock/unlock to macros and add some history logging.

Is possible next scenario:
1. task A lock mutex M
2. task A going to secondary mode
3. task A performing "long" io wait operation and some flags modified.
4. task B tries to lock mutex M
5. core find no task A, because its waiting in secondary mode or it 
think that task A is zombie or due to flag modification it's do not 
think task A is RT
6. mutex M is granted to task B
7. task A returns from secondary mode
8. task A unlock mutex M because it's owned
9. task B tries to unlock, and not properly cleared some flags due to 
step 8.
10. task B tries to lock mutex
11. core see that task B currently not own mutex M and put it into wait 
queue
12. task A tries to lock mutex.
13. core see that task B currently not own mutex M and put it into wait 
queue
14. deadlock here.


BTW
at some moment i got a kernel oops, not very informative but...

Feb  9 23:26:02 dev-3 kernel: BUG: unable to handle kernel NULL pointer 
dereference at virtual address 00000005
Feb  9 23:26:02 dev-3 kernel:  printing eip:
Feb  9 23:26:02 dev-3 kernel: c01b53b2
Feb  9 23:26:02 dev-3 kernel: *pde = 0ae5a001
Feb  9 23:26:02 dev-3 kernel: Oops: 0000 [#1]
Feb  9 23:26:02 dev-3 kernel: Modules linked in: ipv6 sunrpc dm_mirror 
dm_mod lp parport_pc parport ohci1394 ieee1394 floppy snd_intel8x0 
snd_ac97_codec snd_ac97_bus snd_seq_dummy snd_seq_oss i2c_i801 
snd_seq_midi_event snd_seq i2c_core snd_seq_device snd_pcm_oss 
snd_mixer_oss snd_pcm serio_raw snd_timer e100 snd mxser mii soundcore 
snd_page_alloc
Feb  9 23:26:02 dev-3 kernel: CPU:    0
Feb  9 23:26:02 dev-3 kernel: EIP:    0060:[<c01b53b2>]    Not tainted VLI
Feb  9 23:26:02 dev-3 kernel: EFLAGS: 00210246   (2.6.17.14-xenomai #1)
Feb  9 23:26:02 dev-3 kernel: EIP is at remove_proc_entry+0x22/0x1b0
Feb  9 23:26:02 dev-3 kernel: eax: 00000000   ebx: c03a90f4   ecx: 
ffffffff   edx: ce2bfb40
Feb  9 23:26:02 dev-3 kernel: esi: 00000004   edi: 00000005   ebp: 
00000000   esp: cfec9e78
Feb  9 23:26:02 dev-3 kernel: ds: 007b   es: 007b   ss: 0068
Feb  9 23:26:02 dev-3 kernel: Process events/0 (pid: 3, 
threadinfo=cfec8000 task=c1259030)
Feb  9 23:26:02 dev-3 kernel: Stack: cafed7a0 c01b51f1 0000033d 00008000 
ce2bfb40 00000005 c03a90f4 00000004
Feb  9 23:26:02 dev-3 kernel:        ce2bfb40 00000000 c014476e ce2bfb40 
cafeda20 c036352c c0363599 00000000
Feb  9 23:26:02 dev-3 kernel:        cff254a0 c036352c c0363599 ce2bfcc0 
67f8acfc eaeb76d2 75bf19d4 c1e67be3
Feb  9 23:26:02 dev-3 kernel: Call Trace:
Feb  9 23:26:02 dev-3 kernel:  <c01b51f1> proc_register+0x31/0xe0 
<c014476e> registry_proc_callback+0x17e/0x370
Feb  9 23:26:02 dev-3 kernel:  <c01644d0> free_hot_cold_page+0xc0/0x130 
  <c022f956> memmove+0x36/0x40
Feb  9 23:26:02 dev-3 kernel:  <c017b1da> drain_array+0x7a/0xe0 
<c034a5a4> schedule+0x2e4/0x690
Feb  9 23:26:02 dev-3 kernel:  <c034a5d8> schedule+0x318/0x690 
<c0123930> run_workqueue+0x90/0x110
Feb  9 23:26:02 dev-3 kernel:  <c01445f0> 
registry_proc_callback+0x0/0x370  <c0123e97> worker_thread+0x117/0x140
Feb  9 23:26:02 dev-3 kernel:  <c0111c30> default_wake_function+0x0/0x10 
  <c0123d80> worker_thread+0x0/0x140
Feb  9 23:26:02 dev-3 kernel:  <c012676d> kthread+0x9d/0xd0  <c01266d0> 
kthread+0x0/0xd0
Feb  9 23:26:02 dev-3 kernel:  <c0100cb5> kernel_thread_helper+0x5/0x10
Feb  9 23:26:02 dev-3 kernel: Code: 76 00 8d bc 27 00 00 00 00 55 57 56 
53 83 ec 18 85 d2 89 54 24 10 89 44 24 14 0f 84 35 01 00 00 8b 7c 24 14 
31 c0 b9 ff ff ff ff <f2> ae f7 d1 49 8b 5c 24 10 89 cd 83 c3 38 8b 3b 
85 ff 0f 84 c8
Feb  9 23:26:02 dev-3 kernel: EIP: [<c01b53b2>] 
remove_proc_entry+0x22/0x1b0 SS:ESP 0068:cfec9e78


-- 
________________________________________
Maksym Veremeyenko


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Strange deadlock.
  2007-02-09 22:10   ` Maksym Veremeyenko
@ 2007-02-09 23:53     ` Dmitry Adamushko
  0 siblings, 0 replies; 10+ messages in thread
From: Dmitry Adamushko @ 2007-02-09 23:53 UTC (permalink / raw)
  To: Maksym Veremeyenko; +Cc: Xenomai help

> i try to move lock/unlock to macros and add some history logging.
>
> Is possible next scenario:
> 1. task A lock mutex M
> 2. task A going to secondary mode
> 3. task A performing "long" io wait operation and some flags modified.
> ...

What do you mean by "some flags modified" ?
Anyway, the "scenario" you have described is just not how it all works.


> BTW
> at some moment i got a kernel oops, not very informative but...
> ...
> Feb  9 23:26:02 dev-3 kernel: CPU:    0
> Feb  9 23:26:02 dev-3 kernel: EIP:    0060:[<c01b53b2>]    Not tainted VLI
> Feb  9 23:26:02 dev-3 kernel: EFLAGS: 00210246   (2.6.17.14-xenomai #1)
> Feb  9 23:26:02 dev-3 kernel: EIP is at remove_proc_entry+0x22/0x1b0
> ...
> <c014476e> registry_proc_callback+0x17e/0x370

Does it happen at the time your application deletes some objects (e.g.
mutexes) ?

If it's not possible to get a simpler code fragment out of your
application to illustrate this problem, then maybe you could describe
some logic behind your application..
(how often each task acquires a lock / how long does it hold it and
what kind of actions does it do at this time, etc.)...

TIA,

-- 
Best regards,
Dmitry Adamushko


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Strange deadlock.
  2007-02-09 19:35 [Xenomai-help] Strange deadlock Maksym Veremeyenko
  2007-02-09 20:54 ` Philippe Gerum
  2007-02-09 21:14 ` Dmitry Adamushko
@ 2007-02-10  4:07 ` Maksym Veremeyenko
  2007-02-10 10:34   ` Jan Kiszka
  2 siblings, 1 reply; 10+ messages in thread
From: Maksym Veremeyenko @ 2007-02-10  4:07 UTC (permalink / raw)
  To: Xenomai-help

Hi!

I added some logging to mutex log operation and serial port operations 
(executed in secondary mode):

--- skipped ---
#ifdef _DEBUG_
struct log_mutex_opers
{
     RT_TASK_INFO task_info_before;
     RT_TASK_INFO task_info_after;
     RT_MUTEX_INFO mutex_info;
     unsigned char oper;
     RTIME oper_in;
     RTIME oper_out;
     int ser_trans_count;
     RTIME ser_trans_in[128];
     RTIME ser_trans_out[128];
};
#endif /* _DEBUG_ */
--- skipped ---
#ifdef _DEBUG_
#define lock_history_lenght 1000
     struct log_mutex_opers logs[2*lock_history_lenght];
     unsigned long logs_c1;
     unsigned long logs_c2;
#endif /* _DEBUG_ */
--- skipped ---
#define SER_TRANS_LOG(C, T1, T2)                                \
{                                                               \
     int __i = C->logs[0].ser_trans_count;                       \
     C->logs[0].ser_trans_in[__i] = T1;                          \
     C->logs[0].ser_trans_out[__i] = T2;                         \
     C->logs[0].ser_trans_count++;                               \
}
#define _SAVE_STATE_COLLECTION(C, T, O, S)                      \
{                                                               \
     RTIME __t2 = rt_timer_tsc();                                \
     int __i;                                                    \
     for(__i = (2*lock_history_lenght - 1); __i > 0; __i--)      \
         C->logs[__i] = C->logs[__i - 1];                        \
     memset(&C->logs[0], 0, sizeof(struct log_mutex_opers));     \
     rt_task_inquire(NULL, &C->logs[0].task_info_after);         \
     rt_mutex_inquire(&C->lock, &C->logs[0].mutex_info);         \
     C->logs[0].oper = O;                                        \
     C->logs[0].oper_out = __t2;                                 \
     C->logs[0].oper_in = T;                                     \
     C->logs[0].task_info_before = S;                            \
}

#define _LOCK_COLLECTION(C, I)                                  \
{                                                               \
     RTIME __t1 = rt_timer_tsc();                                \
     RT_TASK_INFO __s;                                           \
     rt_task_inquire(NULL, &__s);                                \
     RT_MUTEX_INFO __lock_state;                                 \
     rt_mutex_inquire(&C->lock, &__lock_state);                  \
     rt_mutex_lock(&C->lock ,TM_INFINITE );                      \
     _SAVE_STATE_COLLECTION(C, __t1, 'L', __s);                  \
     I++;                                                        \
};

#define LOCK_COLLECTION_NR(C)                                   \
     _LOCK_COLLECTION(C, C->logs_c1)

#define LOCK_COLLECTION(C, R)                                   \
     _LOCK_COLLECTION(C, C->logs_c2)

#define UNLOCK_COLLECTION(C)                                    \
{                                                               \
     RT_TASK_INFO __s;                                           \
     rt_task_inquire(NULL, &__s);                                \
     _SAVE_STATE_COLLECTION(C, 0, 'U', __s);                     \
     rt_mutex_unlock(&C->lock);                                  \
};
--- skipped ---

After deadlock reached, array has entries ( i listed here last 6 entries 
- one pair is lock/unlock registration):

------------------------------------------------------------------

{{task_info_before = {bprio = 80, cprio = 80, status = 3146624, relpoint 
= 0,
       name = 
"vdcp[0030]m\000\226\002\000\000\231?\023??8N?\202\002\000\000H\037H?"}, 
task_info_after = {bprio = 80,
       cprio = 80, status = 3146624, relpoint = 0,
       name = 
"vdcp[0030]m\000\226\002\000\000\231?\023??8N?\202\002\000\000H\037H?"}, 
mutex_info = {lockcnt = 1,
       nwaiters = 1, name = 
"coll_VDCP_1\000\231?\023??8N?\202\002\000\000H\037H?4?\020?"}, oper = 
85 'U', oper_in = 0,
     oper_out = 5397829215210, ser_trans_count = 0, ser_trans_in = {0 
<repeats 128 times>}, ser_trans_out = {
       0 <repeats 128 times>}},

{task_info_before = {bprio = 80, cprio = 80, status = 3146112, relpoint = 0,
       name = 
"vdcp[0030]m\000\226\002\000\000\231?\023??8N?\202\002\000\000H\037H?"}, 
task_info_after = {bprio = 80,
       cprio = 80, status = 3146112, relpoint = 0,
       name = 
"vdcp[0030]m\000\226\002\000\000\231?\023??8N?\202\002\000\000H\037H?"}, 
mutex_info = {lockcnt = 1,
       nwaiters = 0, name = 
"coll_VDCP_1\000\231?\023??8N?\202\002\000\000H\037H?4?\020?"}, oper = 
76 'L',
     oper_in = 5397792126988, oper_out = 5397792191552, ser_trans_count 
= 1, ser_trans_in = {5397810469917,
       0 <repeats 127 times>}, ser_trans_out = {5397828986260, 0 
<repeats 127 times>}},



{task_info_before = {bprio = 50,
       cprio = 50, status = 3146112, relpoint = 0, name = 
"bus_monitor_proc[0]\000?8N?\202\002\000\000H_??"},
     task_info_after = {bprio = 50, cprio = 50, status = 3146112, 
relpoint = 0,
       name = "bus_monitor_proc[0]\000?8N?\202\002\000\000H_??"}, 
mutex_info = {lockcnt = 1, nwaiters = 0,
       name = 
"coll_VDCP_1\000\231?\023??8N?\202\002\000\000H_??4?\020?"}, oper = 85 
'U', oper_in = 0,
     oper_out = 5397750468006, ser_trans_count = 1, ser_trans_in = 
{5397718529990, 0 <repeats 127 times>}, ser_trans_out = {
       5397770396236, 0 <repeats 127 times>}},

{task_info_before = {bprio = 50, cprio = 50, status = 3146112, relpoint = 0,
       name = "bus_monitor_proc[0]\000?8N?\202\002\000\000H_??"}, 
task_info_after = {bprio = 50, cprio = 50,
       status = 3146112, relpoint = 0, name = 
"bus_monitor_proc[0]\000?8N?\202\002\000\000H_??"}, mutex_info = 
{lockcnt = 1,
       nwaiters = 0, name = 
"coll_VDCP_1\000\231?\023??8N?\202\002\000\000H_??4?\020?"}, oper = 76 'L',
     oper_in = 5397732302828, oper_out = 5397732320388, ser_trans_count 
= 0, ser_trans_in = {0 <repeats 128 times>},
     ser_trans_out = {0 <repeats 128 times>}},




{task_info_before = {bprio = 80, cprio = 80, status = 3146112, relpoint = 0,
       name = 
"vdcp[0030]m\000\226\002\000\000\231?\023??8N?\202\002\000\000H\037H?"}, 
task_info_after = {bprio = 80,
       cprio = 80, status = 3146112, relpoint = 0,
       name = 
"vdcp[0030]m\000\226\002\000\000\231?\023??8N?\202\002\000\000H\037H?"}, 
mutex_info = {lockcnt = 1,
       nwaiters = 0, name = 
"coll_VDCP_1\000\231?\023??8N?\202\002\000\000H\037H?4?\020?"}, oper = 
85 'U', oper_in = 0,
     oper_out = 5397682575612, ser_trans_count = 1, ser_trans_in = 
{5397701107940, 0 <repeats 127 times>}, ser_trans_out = {
       5397718501300, 0 <repeats 127 times>}},

{task_info_before = {bprio = 80, cprio = 80, status = 3146112, relpoint = 0,
       name = 
"vdcp[0030]m\000\226\002\000\000\231?\023??8N?\202\002\000\000H\037H?"}, 
task_info_after = {bprio = 80,
       cprio = 80, status = 3146112, relpoint = 0,
       name = 
"vdcp[0030]m\000\226\002\000\000\231?\023??8N?\202\002\000\000H\037H?"}, 
mutex_info = {lockcnt = 1,
       nwaiters = 0, name = 
"coll_VDCP_1\000\231?\023??8N?\202\002\000\000H\037H?4?\020?"}, oper = 
76 'L',
     oper_in = 5397664238700, oper_out = 5397664304073, ser_trans_count 
= 0, ser_trans_in = {0 <repeats 128 times>},
     ser_trans_out = {0 <repeats 128 times>}},

--- skipped ---

------------------------------------------------------------------

I tried to summarize this info and build time sorted table of actions 
(task B - "bus_monitor_proc[0]", task V - "vdcp[0030]m"):

V	5397829215210	rt_mutex_unlock:ENTER
V	5397828986260	serial_transact:EXIT
V	5397810469917	serial_transact:ENTER
V	5397792191552	rt_mutex_lock:EXIT
V	5397792126988	rt_mutex_lock:ENTER
V	5397770396236	serial_transact:EXIT
B	5397750468006	rt_mutex_unlock:ENTER
B	5397732320388	rt_mutex_lock:EXIT
B	5397732302828	rt_mutex_lock:ENTER
V	5397718529990	serial_transact:ENTER
V	5397718501300	serial_transact:EXIT
V	5397701107940	serial_transact:ENTER
V	5397682575612	rt_mutex_unlock:ENTER
V	5397664304073	rt_mutex_lock:EXIT
V	5397664238700	rt_mutex_lock:ENTER
V	5397618710764	rt_mutex_unlock:ENTER
V	5397600363677	rt_mutex_lock:EXIT
V	5397600295612	rt_mutex_lock:ENTER

As i mentioned early task B was executed during serial operation in task 
V, and may be it could cause switching status of task V from 3146112 
(0x00300180) to 3146624 (0x00300380) - rised bit 0x00000200 (XNRELAX 
0x00000200 Relaxed shadow thread (blocking bit) )

I am not specialist in Xenomai architecture, but it's not looks like 
stack corruption in my program.

I rather going to rewrite some code part to use rtserX instead of /dev/ttyXX


________________________________________
Maksym Veremeyenko



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Strange deadlock.
  2007-02-10  4:07 ` Maksym Veremeyenko
@ 2007-02-10 10:34   ` Jan Kiszka
  2007-02-11 18:28     ` Maksym Veremeyenko
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2007-02-10 10:34 UTC (permalink / raw)
  To: Maksym Veremeyenko; +Cc: Xenomai-help

[-- Attachment #1: Type: text/plain, Size: 2170 bytes --]

[Disclaimer: I'm just jumping on this thread without having read all
details.]

Maksym Veremeyenko wrote:
> ...
> I tried to summarize this info and build time sorted table of actions
> (task B - "bus_monitor_proc[0]", task V - "vdcp[0030]m"):
> 
> V    5397829215210    rt_mutex_unlock:ENTER
> V    5397828986260    serial_transact:EXIT
> V    5397810469917    serial_transact:ENTER

You are holding the mutex across the serial transaction, i.e. a
secondary mode switch, right? If you haven't set
CONFIG_XENO_OPT_RPIDISABLE, you might be are affected by the bug in the
priority coupling between Xenomai and the Linux kernel. Already tried
latest v2.3.x SVN?

> V    5397792191552    rt_mutex_lock:EXIT
> V    5397792126988    rt_mutex_lock:ENTER
> V    5397770396236    serial_transact:EXIT
> B    5397750468006    rt_mutex_unlock:ENTER
> B    5397732320388    rt_mutex_lock:EXIT
> B    5397732302828    rt_mutex_lock:ENTER
> V    5397718529990    serial_transact:ENTER
> V    5397718501300    serial_transact:EXIT
> V    5397701107940    serial_transact:ENTER
> V    5397682575612    rt_mutex_unlock:ENTER
> V    5397664304073    rt_mutex_lock:EXIT
> V    5397664238700    rt_mutex_lock:ENTER
> V    5397618710764    rt_mutex_unlock:ENTER
> V    5397600363677    rt_mutex_lock:EXIT
> V    5397600295612    rt_mutex_lock:ENTER
> 
> As i mentioned early task B was executed during serial operation in task
> V, and may be it could cause switching status of task V from 3146112
> (0x00300180) to 3146624 (0x00300380) - rised bit 0x00000200 (XNRELAX
> 0x00000200 Relaxed shadow thread (blocking bit) )
> 
> I am not specialist in Xenomai architecture, but it's not looks like
> stack corruption in my program.
> 
> I rather going to rewrite some code part to use rtserX instead of
> /dev/ttyXX

Given that you hold a lock across the serial transaction, switching to a
deterministic driver is certainly wiser. Still, understanding your
original problem would be good too so that we get aware of any potential
new bug in Xenomai and are able to fix it. So please try the SVN version
with your current code as well.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Strange deadlock.
  2007-02-10 10:34   ` Jan Kiszka
@ 2007-02-11 18:28     ` Maksym Veremeyenko
  2007-02-11 19:05       ` Philippe Gerum
  0 siblings, 1 reply; 10+ messages in thread
From: Maksym Veremeyenko @ 2007-02-11 18:28 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai-help

Jan Kiszka пишет:

> You are holding the mutex across the serial transaction, i.e. a
> secondary mode switch, right? If you haven't set
> CONFIG_XENO_OPT_RPIDISABLE, you might be are affected by the bug in the
> priority coupling between Xenomai and the Linux kernel. Already tried
> latest v2.3.x SVN?
i tried with r2170 and rewrited code for RT serial - situation the same.

Solution for me is to change tasks priority - set lower priority to task 
that perform read/write serial driver operations under lock.

Perhaps i should rewrite code to exclude I/O operations under locked 
mutex at all.


-- 
________________________________________
Maksym Veremeyenko


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] Strange deadlock.
  2007-02-11 18:28     ` Maksym Veremeyenko
@ 2007-02-11 19:05       ` Philippe Gerum
  0 siblings, 0 replies; 10+ messages in thread
From: Philippe Gerum @ 2007-02-11 19:05 UTC (permalink / raw)
  To: Maksym Veremeyenko; +Cc: Xenomai-help, Jan Kiszka

On Sun, 2007-02-11 at 20:28 +0200, Maksym Veremeyenko wrote:
> Jan Kiszka пишет:
> 
> > You are holding the mutex across the serial transaction, i.e. a
> > secondary mode switch, right? If you haven't set
> > CONFIG_XENO_OPT_RPIDISABLE, you might be are affected by the bug in the
> > priority coupling between Xenomai and the Linux kernel. Already tried
> > latest v2.3.x SVN?
> i tried with r2170 and rewrited code for RT serial - situation the same.
> 
> Solution for me is to change tasks priority - set lower priority to task 
> that perform read/write serial driver operations under lock.
> 
> Perhaps i should rewrite code to exclude I/O operations under locked 
> mutex at all.
> 

In any case, possibly sleeping while holding a mutex and moreover in the
Linux context is not recommended, to say the least. This said, could you
try just replacing your mutex with a simple counting semaphore (RT_SEM)
and report the outcome? This would give us mome hints about what could
be involved in the issue. TIA,

> 
-- 
Philippe.




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-02-11 19:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-09 19:35 [Xenomai-help] Strange deadlock Maksym Veremeyenko
2007-02-09 20:54 ` Philippe Gerum
2007-02-09 21:39   ` Maksym Veremeyenko
2007-02-09 21:14 ` Dmitry Adamushko
2007-02-09 22:10   ` Maksym Veremeyenko
2007-02-09 23:53     ` Dmitry Adamushko
2007-02-10  4:07 ` Maksym Veremeyenko
2007-02-10 10:34   ` Jan Kiszka
2007-02-11 18:28     ` Maksym Veremeyenko
2007-02-11 19:05       ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.