All of lore.kernel.org
 help / color / mirror / Atom feed
* A potential  Xenomai Mutex issue
@ 2019-08-22 18:42 DIAO, Hanson
  2019-08-23  6:47 ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: DIAO, Hanson @ 2019-08-22 18:42 UTC (permalink / raw)
  To: xenomai

Hi all,



I hope you are doing well. Currently I was working on a critical deadlock issue with Xenomail Library(version 2.6.4). I found that for the Xenomai lock count is not reliable after we called rt_mutex_release. I print the following message to you. I hope some developer can help me fix this issue. I know that this version is EOL, but we still use this old version. Thank you so much.



Issue 1:

Before Mutex Lock Mutext addr = 0xb7c059e8,count = 0, owner = 0     This message show the status before rt_mutex_acquire.

After Mutex Lock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd   This message show the status after calling rt_mutex_acquire.     Everything is right for the rt_mutex_acquire in this scenario.



Before Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd   This message show the status before rt_mutex_release.

After Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 0          This message show the status after rt_mutex_release. It seems that the lock count is not correct after call rt_mutex_release.



Issue 2:

When our task is call recursive lock. The mutex lock count should more than 1, but the lock count is still 1.



For the issue 1, I guess that there are something wrong in the release function. I highlighted the code. I am not sure if it is the root cause.



int rt_mutex_release(RT_MUTEX *mutex)

{

#ifdef CONFIG_XENO_FASTSYNCH

        unsigned long status;

        xnhandle_t cur;



        cur = xeno_get_current();

        if (cur == XN_NO_HANDLE)

                return -EPERM;



        status = xeno_get_current_mode();

        if (unlikely(status & XNOTHER))

                /* See rt_mutex_acquire_inner() */

                goto do_syscall;



        if (unlikely(xnsynch_fast_owner_check(mutex->fastlock, cur) != 0))

                return -EPERM;



        if (mutex->lockcnt > 1) {

                mutex->lockcnt--;

                return 0;

        }



        if (likely(xnsynch_fast_release(mutex->fastlock, cur)))

        {

                return 0;

        }

do_syscall:

#endif /* CONFIG_XENO_FASTSYNCH */



        return XENOMAI_SKINCALL1(__native_muxid, __native_mutex_release, mutex);

}







For the Mutex lock function, I am so confused with the following comments which I highlighted as below. I am not sure if it supports the recursive lock.

static int rt_mutex_acquire_inner(RT_MUTEX *mutex, RTIME timeout, xntmode_t mode)

{

        int err;

#ifdef CONFIG_XENO_FASTSYNCH

        unsigned long status;

        xnhandle_t cur;



        cur = xeno_get_current();

        if (cur == XN_NO_HANDLE)

                return -EPERM;



        /*

         * We track resource ownership for non real-time shadows in

         * order to handle the auto-relax feature, so we must always

         * obtain them via a syscall.

         */

        status = xeno_get_current_mode();

        if (unlikely(status & XNOTHER))

                goto do_syscall;



        if (likely(!(status & XNRELAX))) {

                err = xnsynch_fast_acquire(mutex->fastlock, cur);

                if (likely(!err)) {

                        mutex->lockcnt = 1;

                        return 0;

                }



                if (err == -EBUSY) {

                        if (mutex->lockcnt == UINT_MAX)

                                return -EAGAIN;



                        mutex->lockcnt++;

                        return 0;

                }



                if (timeout == TM_NONBLOCK && mode == XN_RELATIVE)

                        return -EWOULDBLOCK;

        } else if (xnsynch_fast_owner_check(mutex->fastlock, cur) == 0) {

                /*

                 * The application is buggy as it jumped to secondary mode

                 * while holding the mutex. Nevertheless, we have to keep the

                 * mutex state consistent.

                 *

                 * We make no efforts to migrate or warn here. There is

                 * XENO_DEBUG(SYNCH_RELAX) to catch such bugs.

                 */

                if (mutex->lockcnt == UINT_MAX)

                        return -EAGAIN;



                mutex->lockcnt++;

                return 0;

        }

do_syscall:

#endif /* CONFIG_XENO_FASTSYNCH */



        err = XENOMAI_SKINCALL3(__native_muxid,

                                __native_mutex_acquire, mutex, mode, &timeout);



#ifdef CONFIG_XENO_FASTSYNCH

        if (!err)

                mutex->lockcnt = 1;

#endif /* CONFIG_XENO_FASTSYNCH */



        return err;

}






^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A potential Xenomai Mutex issue
  2019-08-22 18:42 A potential Xenomai Mutex issue DIAO, Hanson
@ 2019-08-23  6:47 ` Jan Kiszka
  2019-08-23 14:02   ` DIAO, Hanson
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2019-08-23  6:47 UTC (permalink / raw)
  To: DIAO, Hanson, xenomai

On 22.08.19 20:42, DIAO, Hanson via Xenomai wrote:
> Hi all,
> 
> 
> 
> I hope you are doing well. Currently I was working on a critical deadlock issue with Xenomail Library(version 2.6.4). I found that for the Xenomai lock count is not reliable after we called rt_mutex_release. I print the following message to you. I hope some developer can help me fix this issue. I know that this version is EOL, but we still use this old version. Thank you so much.
> 

This is on a ARMv7 multicore target, right? Are you already able to reproduce 
the issue reliably, possibly in a synthetic environment? Or does your whole 
stack have to run on the target for a long time to trigger this? Is the mutex 
shared between multiple process or just between threads of the same process?

Next point: You are on 2.6.4 while the last release was 2.6.5. It contained e.g. 
8047147aff9d (posix/mutex: handle recursion count completely in user-space). 
Maybe something analogously was needed for native as well. And then you could 
look at what happened in 3.x mutex-wise to check if you are not missing a 
conceptual fix in 2.6.

> 
> 
> Issue 1:
> 
> Before Mutex Lock Mutext addr = 0xb7c059e8,count = 0, owner = 0     This message show the status before rt_mutex_acquire.
> 
> After Mutex Lock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd   This message show the status after calling rt_mutex_acquire.     Everything is right for the rt_mutex_acquire in this scenario.
> 
> 
> 
> Before Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd   This message show the status before rt_mutex_release.
> 
> After Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 0          This message show the status after rt_mutex_release. It seems that the lock count is not correct after call rt_mutex_release.
> 

You seem to look at the wrong data structure. You need to examine 
RT_MUTEX_PLACEHOLDER fields.

> 
> 
> Issue 2:
> 
> When our task is call recursive lock. The mutex lock count should more than 1, but the lock count is still 1.
> 
> 
> 
> For the issue 1, I guess that there are something wrong in the release function. I highlighted the code. I am not sure if it is the root cause.
> 

Don't use HTML emails on public lists. They often get filtered, at latest on 
receiver side.

Jan

> 
> 
> int rt_mutex_release(RT_MUTEX *mutex)
> 
> {
> 
> #ifdef CONFIG_XENO_FASTSYNCH
> 
>          unsigned long status;
> 
>          xnhandle_t cur;
> 
> 
> 
>          cur = xeno_get_current();
> 
>          if (cur == XN_NO_HANDLE)
> 
>                  return -EPERM;
> 
> 
> 
>          status = xeno_get_current_mode();
> 
>          if (unlikely(status & XNOTHER))
> 
>                  /* See rt_mutex_acquire_inner() */
> 
>                  goto do_syscall;
> 
> 
> 
>          if (unlikely(xnsynch_fast_owner_check(mutex->fastlock, cur) != 0))
> 
>                  return -EPERM;
> 
> 
> 
>          if (mutex->lockcnt > 1) {
> 
>                  mutex->lockcnt--;
> 
>                  return 0;
> 
>          }
> 
> 
> 
>          if (likely(xnsynch_fast_release(mutex->fastlock, cur)))
> 
>          {
> 
>                  return 0;
> 
>          }
> 
> do_syscall:
> 
> #endif /* CONFIG_XENO_FASTSYNCH */
> 
> 
> 
>          return XENOMAI_SKINCALL1(__native_muxid, __native_mutex_release, mutex);
> 
> }
> 
> 
> 
> 
> 
> 
> 
> For the Mutex lock function, I am so confused with the following comments which I highlighted as below. I am not sure if it supports the recursive lock.
> 
> static int rt_mutex_acquire_inner(RT_MUTEX *mutex, RTIME timeout, xntmode_t mode)
> 
> {
> 
>          int err;
> 
> #ifdef CONFIG_XENO_FASTSYNCH
> 
>          unsigned long status;
> 
>          xnhandle_t cur;
> 
> 
> 
>          cur = xeno_get_current();
> 
>          if (cur == XN_NO_HANDLE)
> 
>                  return -EPERM;
> 
> 
> 
>          /*
> 
>           * We track resource ownership for non real-time shadows in
> 
>           * order to handle the auto-relax feature, so we must always
> 
>           * obtain them via a syscall.
> 
>           */
> 
>          status = xeno_get_current_mode();
> 
>          if (unlikely(status & XNOTHER))
> 
>                  goto do_syscall;
> 
> 
> 
>          if (likely(!(status & XNRELAX))) {
> 
>                  err = xnsynch_fast_acquire(mutex->fastlock, cur);
> 
>                  if (likely(!err)) {
> 
>                          mutex->lockcnt = 1;
> 
>                          return 0;
> 
>                  }
> 
> 
> 
>                  if (err == -EBUSY) {
> 
>                          if (mutex->lockcnt == UINT_MAX)
> 
>                                  return -EAGAIN;
> 
> 
> 
>                          mutex->lockcnt++;
> 
>                          return 0;
> 
>                  }
> 
> 
> 
>                  if (timeout == TM_NONBLOCK && mode == XN_RELATIVE)
> 
>                          return -EWOULDBLOCK;
> 
>          } else if (xnsynch_fast_owner_check(mutex->fastlock, cur) == 0) {
> 
>                  /*
> 
>                   * The application is buggy as it jumped to secondary mode
> 
>                   * while holding the mutex. Nevertheless, we have to keep the
> 
>                   * mutex state consistent.
> 
>                   *
> 
>                   * We make no efforts to migrate or warn here. There is
> 
>                   * XENO_DEBUG(SYNCH_RELAX) to catch such bugs.
> 
>                   */
> 
>                  if (mutex->lockcnt == UINT_MAX)
> 
>                          return -EAGAIN;
> 
> 
> 
>                  mutex->lockcnt++;
> 
>                  return 0;
> 
>          }
> 
> do_syscall:
> 
> #endif /* CONFIG_XENO_FASTSYNCH */
> 
> 
> 
>          err = XENOMAI_SKINCALL3(__native_muxid,
> 
>                                  __native_mutex_acquire, mutex, mode, &timeout);
> 
> 
> 
> #ifdef CONFIG_XENO_FASTSYNCH
> 
>          if (!err)
> 
>                  mutex->lockcnt = 1;
> 
> #endif /* CONFIG_XENO_FASTSYNCH */
> 
> 
> 
>          return err;
> 
> }
> 
> 
> 
> 
> 

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: A potential Xenomai Mutex issue
  2019-08-23  6:47 ` Jan Kiszka
@ 2019-08-23 14:02   ` DIAO, Hanson
  2019-08-23 14:16     ` Jan Kiszka
  2019-08-23 14:18     ` DIAO, Hanson
  0 siblings, 2 replies; 11+ messages in thread
From: DIAO, Hanson @ 2019-08-23 14:02 UTC (permalink / raw)
  To: jan.kiszka; +Cc: xenomai

Hi Jan,

Thank you for your reply. I will answer the questions one by one.

Q: This is on a ARMv7 multicore target, rightIs?
HD: This is PowerPC target.

Q: Are you already able to reproduce the issue reliably, possibly in a synthetic environment?
HD: It reproduces every time for the first issue and second issue(recursive lock lock count should be more than 1).

Q:  Or does your whole stack have to run on the target for a long time to trigger this?
HD: I got this issue when the system was in initialized stage. It is easy to trigger this and every time it happens.

Q: the mutex shared between multiple process or just between threads of the same process?
HD: The mutex shared only in one process with multi-tasks.

Q:Maybe something analogously was needed for native as well. And then you could look at what happened in 3.x mutex-wise to check if you are not missing a conceptual fix in 2.6.
HD: I will check the commit message. I compared 2.6.4 version with 2.6.5 version. It seems that the code are same in mutex(User space mutex).

Q:You seem to look at the wrong data structure. You need to examine RT_MUTEX_PLACEHOLDER fields.
HD: The data structure which I got is RT_MUTEX_PLACEHOLDER fields. I attached the code as below.

typedef struct rt_mutex_placeholder {

        xnhandle_t opaque;

#ifdef CONFIG_XENO_FASTSYNCH
        xnarch_atomic_t *fastlock;

        int lockcnt;
#endif /* CONFIG_XENO_FASTSYNCH */

} RT_MUTEX_PLACEHOLDER;

-----Original Message-----
From: Jan Kiszka <jan.kiszka@siemens.com>
Sent: Friday, August 23, 2019 2:48 AM
To: DIAO, Hanson (DI PA CI RC R&D SW2) <hanson.diao@siemens.com>; xenomai@xenomai.org
Subject: Re: A potential Xenomai Mutex issue

On 22.08.19 20:42, DIAO, Hanson via Xenomai wrote:
> Hi all,
>
>
>
> I hope you are doing well. Currently I was working on a critical deadlock issue with Xenomail Library(version 2.6.4). I found that for the Xenomai lock count is not reliable after we called rt_mutex_release. I print the following message to you. I hope some developer can help me fix this issue. I know that this version is EOL, but we still use this old version. Thank you so much.
>

This is on a ARMv7 multicore target, right? Are you already able to reproduce the issue reliably, possibly in a synthetic environment? Or does your whole stack have to run on the target for a long time to trigger this? Is the mutex shared between multiple process or just between threads of the same process?

Next point: You are on 2.6.4 while the last release was 2.6.5. It contained e.g.
8047147aff9d (posix/mutex: handle recursion count completely in user-space).

>
>
> Issue 1:
>
> Before Mutex Lock Mutext addr = 0xb7c059e8,count = 0, owner = 0     This message show the status before rt_mutex_acquire.
>
> After Mutex Lock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd   This message show the status after calling rt_mutex_acquire.     Everything is right for the rt_mutex_acquire in this scenario.
>
>
>
> Before Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd   This message show the status before rt_mutex_release.
>
> After Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 0          This message show the status after rt_mutex_release. It seems that the lock count is not correct after call rt_mutex_release.
>


>
>
> Issue 2:
>
> When our task is call recursive lock. The mutex lock count should more than 1, but the lock count is still 1.
>
>
>
> For the issue 1, I guess that there are something wrong in the release function. I highlighted the code. I am not sure if it is the root cause.
>

Don't use HTML emails on public lists. They often get filtered, at latest on receiver side.

Jan

>
>
> int rt_mutex_release(RT_MUTEX *mutex)
>
> {
>
> #ifdef CONFIG_XENO_FASTSYNCH
>
>          unsigned long status;
>
>          xnhandle_t cur;
>
>
>
>          cur = xeno_get_current();
>
>          if (cur == XN_NO_HANDLE)
>
>                  return -EPERM;
>
>
>
>          status = xeno_get_current_mode();
>
>          if (unlikely(status & XNOTHER))
>
>                  /* See rt_mutex_acquire_inner() */
>
>                  goto do_syscall;
>
>
>
>          if (unlikely(xnsynch_fast_owner_check(mutex->fastlock, cur)
> != 0))
>
>                  return -EPERM;
>
>
>
>          if (mutex->lockcnt > 1) {
>
>                  mutex->lockcnt--;
>
>                  return 0;
>
>          }
>
>
>
>          if (likely(xnsynch_fast_release(mutex->fastlock, cur)))
>
>          {
>
>                  return 0;
>
>          }
>
> do_syscall:
>
> #endif /* CONFIG_XENO_FASTSYNCH */
>
>
>
>          return XENOMAI_SKINCALL1(__native_muxid,
> __native_mutex_release, mutex);
>
> }
>
>
>
>
>
>
>
> For the Mutex lock function, I am so confused with the following comments which I highlighted as below. I am not sure if it supports the recursive lock.
>
> static int rt_mutex_acquire_inner(RT_MUTEX *mutex, RTIME timeout,
> xntmode_t mode)
>
> {
>
>          int err;
>
> #ifdef CONFIG_XENO_FASTSYNCH
>
>          unsigned long status;
>
>          xnhandle_t cur;
>
>
>
>          cur = xeno_get_current();
>
>          if (cur == XN_NO_HANDLE)
>
>                  return -EPERM;
>
>
>
>          /*
>
>           * We track resource ownership for non real-time shadows in
>
>           * order to handle the auto-relax feature, so we must always
>
>           * obtain them via a syscall.
>
>           */
>
>          status = xeno_get_current_mode();
>
>          if (unlikely(status & XNOTHER))
>
>                  goto do_syscall;
>
>
>
>          if (likely(!(status & XNRELAX))) {
>
>                  err = xnsynch_fast_acquire(mutex->fastlock, cur);
>
>                  if (likely(!err)) {
>
>                          mutex->lockcnt = 1;
>
>                          return 0;
>
>                  }
>
>
>
>                  if (err == -EBUSY) {
>
>                          if (mutex->lockcnt == UINT_MAX)
>
>                                  return -EAGAIN;
>
>
>
>                          mutex->lockcnt++;
>
>                          return 0;
>
>                  }
>
>
>
>                  if (timeout == TM_NONBLOCK && mode == XN_RELATIVE)
>
>                          return -EWOULDBLOCK;
>
>          } else if (xnsynch_fast_owner_check(mutex->fastlock, cur) ==
> 0) {
>
>                  /*
>
>                   * The application is buggy as it jumped to secondary
> mode
>
>                   * while holding the mutex. Nevertheless, we have to
> keep the
>
>                   * mutex state consistent.
>
>                   *
>
>                   * We make no efforts to migrate or warn here. There
> is
>
>                   * XENO_DEBUG(SYNCH_RELAX) to catch such bugs.
>
>                   */
>
>                  if (mutex->lockcnt == UINT_MAX)
>
>                          return -EAGAIN;
>
>
>
>                  mutex->lockcnt++;
>
>                  return 0;
>
>          }
>
> do_syscall:
>
> #endif /* CONFIG_XENO_FASTSYNCH */
>
>
>
>          err = XENOMAI_SKINCALL3(__native_muxid,
>
>                                  __native_mutex_acquire, mutex, mode,
> &timeout);
>
>
>
> #ifdef CONFIG_XENO_FASTSYNCH
>
>          if (!err)
>
>                  mutex->lockcnt = 1;
>
> #endif /* CONFIG_XENO_FASTSYNCH */
>
>
>
>          return err;
>
> }
>
>
>
>
>

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A potential Xenomai Mutex issue
  2019-08-23 14:02   ` DIAO, Hanson
@ 2019-08-23 14:16     ` Jan Kiszka
  2019-08-23 14:29       ` DIAO, Hanson
  2019-08-23 14:18     ` DIAO, Hanson
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2019-08-23 14:16 UTC (permalink / raw)
  To: DIAO, Hanson (DI PA CI RC R&D SW2); +Cc: xenomai

On 23.08.19 16:02, DIAO, Hanson (DI PA CI RC R&D SW2) wrote:
> Hi Jan,
> 
> Thank you for your reply. I will answer the questions one by one.
> 
> Q: This is on a ARMv7 multicore target, rightIs?
> HD: This is PowerPC target.
> 
> Q: Are you already able to reproduce the issue reliably, possibly in a synthetic environment?
> HD: It reproduces every time for the first issue and second issue(recursive lock lock count should be more than 1).
> 

Your dump was talking about "count = 1", but the counter variable is called 
"lockcnt".

> Q:  Or does your whole stack have to run on the target for a long time to trigger this?
> HD: I got this issue when the system was in initialized stage. It is easy to trigger this and every time it happens.

Can you extract a simple (and, thus, shareable) test case from that?

> 
> Q: the mutex shared between multiple process or just between threads of the same process?
> HD: The mutex shared only in one process with multi-tasks.
> 
> Q:Maybe something analogously was needed for native as well. And then you could look at what happened in 3.x mutex-wise to check if you are not missing a conceptual fix in 2.6.
> HD: I will check the commit message. I compared 2.6.4 version with 2.6.5 version. It seems that the code are same in mutex(User space mutex).

Yes, the mutex patch in 2.6.5 was targeting only the posix skin. I didn't look 
into detail, but maybe that fix should have been applied to the native 
implementation as well. The problem is that at this point development moved on 
to 3.x, and there is only one implementation (which received such a fix as well).

> 
> Q:You seem to look at the wrong data structure. You need to examine RT_MUTEX_PLACEHOLDER fields.
> HD: The data structure which I got is RT_MUTEX_PLACEHOLDER fields. I attached the code as below.
> 
> typedef struct rt_mutex_placeholder {
> 
>          xnhandle_t opaque;
> 
> #ifdef CONFIG_XENO_FASTSYNCH
>          xnarch_atomic_t *fastlock;
> 
>          int lockcnt;
> #endif /* CONFIG_XENO_FASTSYNCH */
> 
> } RT_MUTEX_PLACEHOLDER;

See my remark above in lockcnt.

Jan

> 
> -----Original Message-----
> From: Jan Kiszka <jan.kiszka@siemens.com>
> Sent: Friday, August 23, 2019 2:48 AM
> To: DIAO, Hanson (DI PA CI RC R&D SW2) <hanson.diao@siemens.com>; xenomai@xenomai.org
> Subject: Re: A potential Xenomai Mutex issue
> 
> On 22.08.19 20:42, DIAO, Hanson via Xenomai wrote:
>> Hi all,
>>
>>
>>
>> I hope you are doing well. Currently I was working on a critical deadlock issue with Xenomail Library(version 2.6.4). I found that for the Xenomai lock count is not reliable after we called rt_mutex_release. I print the following message to you. I hope some developer can help me fix this issue. I know that this version is EOL, but we still use this old version. Thank you so much.
>>
> 
> This is on a ARMv7 multicore target, right? Are you already able to reproduce the issue reliably, possibly in a synthetic environment? Or does your whole stack have to run on the target for a long time to trigger this? Is the mutex shared between multiple process or just between threads of the same process?
> 
> Next point: You are on 2.6.4 while the last release was 2.6.5. It contained e.g.
> 8047147aff9d (posix/mutex: handle recursion count completely in user-space).
> 
>>
>>
>> Issue 1:
>>
>> Before Mutex Lock Mutext addr = 0xb7c059e8,count = 0, owner = 0     This message show the status before rt_mutex_acquire.
>>
>> After Mutex Lock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd   This message show the status after calling rt_mutex_acquire.     Everything is right for the rt_mutex_acquire in this scenario.
>>
>>
>>
>> Before Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd   This message show the status before rt_mutex_release.
>>
>> After Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 0          This message show the status after rt_mutex_release. It seems that the lock count is not correct after call rt_mutex_release.
>>
> 
> 
>>
>>
>> Issue 2:
>>
>> When our task is call recursive lock. The mutex lock count should more than 1, but the lock count is still 1.
>>
>>
>>
>> For the issue 1, I guess that there are something wrong in the release function. I highlighted the code. I am not sure if it is the root cause.
>>
> 
> Don't use HTML emails on public lists. They often get filtered, at latest on receiver side.
> 
> Jan
> 
>>
>>
>> int rt_mutex_release(RT_MUTEX *mutex)
>>
>> {
>>
>> #ifdef CONFIG_XENO_FASTSYNCH
>>
>>           unsigned long status;
>>
>>           xnhandle_t cur;
>>
>>
>>
>>           cur = xeno_get_current();
>>
>>           if (cur == XN_NO_HANDLE)
>>
>>                   return -EPERM;
>>
>>
>>
>>           status = xeno_get_current_mode();
>>
>>           if (unlikely(status & XNOTHER))
>>
>>                   /* See rt_mutex_acquire_inner() */
>>
>>                   goto do_syscall;
>>
>>
>>
>>           if (unlikely(xnsynch_fast_owner_check(mutex->fastlock, cur)
>> != 0))
>>
>>                   return -EPERM;
>>
>>
>>
>>           if (mutex->lockcnt > 1) {
>>
>>                   mutex->lockcnt--;
>>
>>                   return 0;
>>
>>           }
>>
>>
>>
>>           if (likely(xnsynch_fast_release(mutex->fastlock, cur)))
>>
>>           {
>>
>>                   return 0;
>>
>>           }
>>
>> do_syscall:
>>
>> #endif /* CONFIG_XENO_FASTSYNCH */
>>
>>
>>
>>           return XENOMAI_SKINCALL1(__native_muxid,
>> __native_mutex_release, mutex);
>>
>> }
>>
>>
>>
>>
>>
>>
>>
>> For the Mutex lock function, I am so confused with the following comments which I highlighted as below. I am not sure if it supports the recursive lock.
>>
>> static int rt_mutex_acquire_inner(RT_MUTEX *mutex, RTIME timeout,
>> xntmode_t mode)
>>
>> {
>>
>>           int err;
>>
>> #ifdef CONFIG_XENO_FASTSYNCH
>>
>>           unsigned long status;
>>
>>           xnhandle_t cur;
>>
>>
>>
>>           cur = xeno_get_current();
>>
>>           if (cur == XN_NO_HANDLE)
>>
>>                   return -EPERM;
>>
>>
>>
>>           /*
>>
>>            * We track resource ownership for non real-time shadows in
>>
>>            * order to handle the auto-relax feature, so we must always
>>
>>            * obtain them via a syscall.
>>
>>            */
>>
>>           status = xeno_get_current_mode();
>>
>>           if (unlikely(status & XNOTHER))
>>
>>                   goto do_syscall;
>>
>>
>>
>>           if (likely(!(status & XNRELAX))) {
>>
>>                   err = xnsynch_fast_acquire(mutex->fastlock, cur);
>>
>>                   if (likely(!err)) {
>>
>>                           mutex->lockcnt = 1;
>>
>>                           return 0;
>>
>>                   }
>>
>>
>>
>>                   if (err == -EBUSY) {
>>
>>                           if (mutex->lockcnt == UINT_MAX)
>>
>>                                   return -EAGAIN;
>>
>>
>>
>>                           mutex->lockcnt++;
>>
>>                           return 0;
>>
>>                   }
>>
>>
>>
>>                   if (timeout == TM_NONBLOCK && mode == XN_RELATIVE)
>>
>>                           return -EWOULDBLOCK;
>>
>>           } else if (xnsynch_fast_owner_check(mutex->fastlock, cur) ==
>> 0) {
>>
>>                   /*
>>
>>                    * The application is buggy as it jumped to secondary
>> mode
>>
>>                    * while holding the mutex. Nevertheless, we have to
>> keep the
>>
>>                    * mutex state consistent.
>>
>>                    *
>>
>>                    * We make no efforts to migrate or warn here. There
>> is
>>
>>                    * XENO_DEBUG(SYNCH_RELAX) to catch such bugs.
>>
>>                    */
>>
>>                   if (mutex->lockcnt == UINT_MAX)
>>
>>                           return -EAGAIN;
>>
>>
>>
>>                   mutex->lockcnt++;
>>
>>                   return 0;
>>
>>           }
>>
>> do_syscall:
>>
>> #endif /* CONFIG_XENO_FASTSYNCH */
>>
>>
>>
>>           err = XENOMAI_SKINCALL3(__native_muxid,
>>
>>                                   __native_mutex_acquire, mutex, mode,
>> &timeout);
>>
>>
>>
>> #ifdef CONFIG_XENO_FASTSYNCH
>>
>>           if (!err)
>>
>>                   mutex->lockcnt = 1;
>>
>> #endif /* CONFIG_XENO_FASTSYNCH */
>>
>>
>>
>>           return err;
>>
>> }
>>
>>
>>
>>
>>
> 
> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
> 

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: A potential Xenomai Mutex issue
  2019-08-23 14:02   ` DIAO, Hanson
  2019-08-23 14:16     ` Jan Kiszka
@ 2019-08-23 14:18     ` DIAO, Hanson
  1 sibling, 0 replies; 11+ messages in thread
From: DIAO, Hanson @ 2019-08-23 14:18 UTC (permalink / raw)
  To: jan.kiszka; +Cc: xenomai

Hi Jan,

I checked the commitment(8047147aff9d). Our platform use different mutex(./src/skin/native/mutex.c). The commitment is only for POSIX code. Thank you so much.

Best Regards,
Hanson Diao

-----Original Message-----
From: DIAO, Hanson (DI PA CI RC R&D SW2)
Sent: Friday, August 23, 2019 10:03 AM
To: Kiszka, Jan (CT RDA IOT SES-DE) <jan.kiszka@siemens.com>
Cc: xenomai@xenomai.org
Subject: RE: A potential Xenomai Mutex issue

Hi Jan,

Thank you for your reply. I will answer the questions one by one.

Q: This is on a ARMv7 multicore target, rightIs?
HD: This is PowerPC target.

Q: Are you already able to reproduce the issue reliably, possibly in a synthetic environment?
HD: It reproduces every time for the first issue and second issue(recursive lock lock count should be more than 1).

Q:  Or does your whole stack have to run on the target for a long time to trigger this?
HD: I got this issue when the system was in initialized stage. It is easy to trigger this and every time it happens.

Q: the mutex shared between multiple process or just between threads of the same process?
HD: The mutex shared only in one process with multi-tasks.

Q:Maybe something analogously was needed for native as well. And then you could look at what happened in 3.x mutex-wise to check if you are not missing a conceptual fix in 2.6.
HD: I will check the commit message. I compared 2.6.4 version with 2.6.5 version. It seems that the code are same in mutex(User space mutex).

Q:You seem to look at the wrong data structure. You need to examine RT_MUTEX_PLACEHOLDER fields.
HD: The data structure which I got is RT_MUTEX_PLACEHOLDER fields. I attached the code as below.

typedef struct rt_mutex_placeholder {

        xnhandle_t opaque;

#ifdef CONFIG_XENO_FASTSYNCH
        xnarch_atomic_t *fastlock;

        int lockcnt;
#endif /* CONFIG_XENO_FASTSYNCH */

} RT_MUTEX_PLACEHOLDER;

-----Original Message-----
From: Jan Kiszka <jan.kiszka@siemens.com>
Sent: Friday, August 23, 2019 2:48 AM
To: DIAO, Hanson (DI PA CI RC R&D SW2) <hanson.diao@siemens.com>; xenomai@xenomai.org
Subject: Re: A potential Xenomai Mutex issue

On 22.08.19 20:42, DIAO, Hanson via Xenomai wrote:
> Hi all,
>
>
>
> I hope you are doing well. Currently I was working on a critical deadlock issue with Xenomail Library(version 2.6.4). I found that for the Xenomai lock count is not reliable after we called rt_mutex_release. I print the following message to you. I hope some developer can help me fix this issue. I know that this version is EOL, but we still use this old version. Thank you so much.
>

This is on a ARMv7 multicore target, right? Are you already able to reproduce the issue reliably, possibly in a synthetic environment? Or does your whole stack have to run on the target for a long time to trigger this? Is the mutex shared between multiple process or just between threads of the same process?

Next point: You are on 2.6.4 while the last release was 2.6.5. It contained e.g.
8047147aff9d (posix/mutex: handle recursion count completely in user-space).

>
>
> Issue 1:
>
> Before Mutex Lock Mutext addr = 0xb7c059e8,count = 0, owner = 0     This message show the status before rt_mutex_acquire.
>
> After Mutex Lock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd   This message show the status after calling rt_mutex_acquire.     Everything is right for the rt_mutex_acquire in this scenario.
>
>
>
> Before Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd   This message show the status before rt_mutex_release.
>
> After Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 0          This message show the status after rt_mutex_release. It seems that the lock count is not correct after call rt_mutex_release.
>


>
>
> Issue 2:
>
> When our task is call recursive lock. The mutex lock count should more than 1, but the lock count is still 1.
>
>
>
> For the issue 1, I guess that there are something wrong in the release function. I highlighted the code. I am not sure if it is the root cause.
>

Don't use HTML emails on public lists. They often get filtered, at latest on receiver side.

Jan

>
>
> int rt_mutex_release(RT_MUTEX *mutex)
>
> {
>
> #ifdef CONFIG_XENO_FASTSYNCH
>
>          unsigned long status;
>
>          xnhandle_t cur;
>
>
>
>          cur = xeno_get_current();
>
>          if (cur == XN_NO_HANDLE)
>
>                  return -EPERM;
>
>
>
>          status = xeno_get_current_mode();
>
>          if (unlikely(status & XNOTHER))
>
>                  /* See rt_mutex_acquire_inner() */
>
>                  goto do_syscall;
>
>
>
>          if (unlikely(xnsynch_fast_owner_check(mutex->fastlock, cur)
> != 0))
>
>                  return -EPERM;
>
>
>
>          if (mutex->lockcnt > 1) {
>
>                  mutex->lockcnt--;
>
>                  return 0;
>
>          }
>
>
>
>          if (likely(xnsynch_fast_release(mutex->fastlock, cur)))
>
>          {
>
>                  return 0;
>
>          }
>
> do_syscall:
>
> #endif /* CONFIG_XENO_FASTSYNCH */
>
>
>
>          return XENOMAI_SKINCALL1(__native_muxid,
> __native_mutex_release, mutex);
>
> }
>
>
>
>
>
>
>
> For the Mutex lock function, I am so confused with the following comments which I highlighted as below. I am not sure if it supports the recursive lock.
>
> static int rt_mutex_acquire_inner(RT_MUTEX *mutex, RTIME timeout,
> xntmode_t mode)
>
> {
>
>          int err;
>
> #ifdef CONFIG_XENO_FASTSYNCH
>
>          unsigned long status;
>
>          xnhandle_t cur;
>
>
>
>          cur = xeno_get_current();
>
>          if (cur == XN_NO_HANDLE)
>
>                  return -EPERM;
>
>
>
>          /*
>
>           * We track resource ownership for non real-time shadows in
>
>           * order to handle the auto-relax feature, so we must always
>
>           * obtain them via a syscall.
>
>           */
>
>          status = xeno_get_current_mode();
>
>          if (unlikely(status & XNOTHER))
>
>                  goto do_syscall;
>
>
>
>          if (likely(!(status & XNRELAX))) {
>
>                  err = xnsynch_fast_acquire(mutex->fastlock, cur);
>
>                  if (likely(!err)) {
>
>                          mutex->lockcnt = 1;
>
>                          return 0;
>
>                  }
>
>
>
>                  if (err == -EBUSY) {
>
>                          if (mutex->lockcnt == UINT_MAX)
>
>                                  return -EAGAIN;
>
>
>
>                          mutex->lockcnt++;
>
>                          return 0;
>
>                  }
>
>
>
>                  if (timeout == TM_NONBLOCK && mode == XN_RELATIVE)
>
>                          return -EWOULDBLOCK;
>
>          } else if (xnsynch_fast_owner_check(mutex->fastlock, cur) ==
> 0) {
>
>                  /*
>
>                   * The application is buggy as it jumped to secondary
> mode
>
>                   * while holding the mutex. Nevertheless, we have to
> keep the
>
>                   * mutex state consistent.
>
>                   *
>
>                   * We make no efforts to migrate or warn here. There
> is
>
>                   * XENO_DEBUG(SYNCH_RELAX) to catch such bugs.
>
>                   */
>
>                  if (mutex->lockcnt == UINT_MAX)
>
>                          return -EAGAIN;
>
>
>
>                  mutex->lockcnt++;
>
>                  return 0;
>
>          }
>
> do_syscall:
>
> #endif /* CONFIG_XENO_FASTSYNCH */
>
>
>
>          err = XENOMAI_SKINCALL3(__native_muxid,
>
>                                  __native_mutex_acquire, mutex, mode,
> &timeout);
>
>
>
> #ifdef CONFIG_XENO_FASTSYNCH
>
>          if (!err)
>
>                  mutex->lockcnt = 1;
>
> #endif /* CONFIG_XENO_FASTSYNCH */
>
>
>
>          return err;
>
> }
>
>
>
>
>

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: A potential Xenomai Mutex issue
  2019-08-23 14:16     ` Jan Kiszka
@ 2019-08-23 14:29       ` DIAO, Hanson
  2019-08-23 15:23         ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: DIAO, Hanson @ 2019-08-23 14:29 UTC (permalink / raw)
  To: jan.kiszka; +Cc: xenomai

Hi Jan,

I attached my code here. This is only Lock function. Unlock function is similar.


        printf("Before Mutex Lock Mutext addr = %p,count = %d, owner = %x\n",
                mpMutex,
                mpMutex->lockcnt,
                xnarch_atomic_get(mpMutex->fastlock));
        int err = rt_mutex_acquire( mpMutex, (RTIME)TM_INFINITE );
        if ( err &&
                 // During boot-up and shutdown we run single-threaded
                 // so there is no need to lock an semaphore.
                 !(rc_system_state() != SYSTEM_RUNNING && err == -EPERM) )
        {
                rc_xeno_log( LOG_ERROR , "rt_mutex_acquire" , err);
        }
        printf("After Mutex Lock Mutext addr = %p,count = %d, owner = %x\n",
                mpMutex,
                mpMutex->lockcnt,
                xnarch_atomic_get(mpMutex->fastlock));


For the issue 2 the test case is very simple. The lock sequence is as below. After ReadReg function I checked the lockcnt. The lockcnt is 1.
Int writeReg()
{
     SMILock();
     ReadReg();
     ......
     SMIUnlock();
}

Int ReadReg()
{
     SMILock();
     ........
     ......
     SMIUnlock();
}

-----Original Message-----
From: Jan Kiszka <jan.kiszka@siemens.com>
Sent: Friday, August 23, 2019 10:17 AM
To: DIAO, Hanson (DI PA CI RC R&D SW2) <hanson.diao@siemens.com>
Cc: xenomai@xenomai.org
Subject: Re: A potential Xenomai Mutex issue

On 23.08.19 16:02, DIAO, Hanson (DI PA CI RC R&D SW2) wrote:
> Hi Jan,
>
> Thank you for your reply. I will answer the questions one by one.
>
> Q: This is on a ARMv7 multicore target, rightIs?
> HD: This is PowerPC target.
>
> Q: Are you already able to reproduce the issue reliably, possibly in a synthetic environment?
> HD: It reproduces every time for the first issue and second issue(recursive lock lock count should be more than 1).
>

Your dump was talking about "count = 1", but the counter variable is called "lockcnt".

> Q:  Or does your whole stack have to run on the target for a long time to trigger this?
> HD: I got this issue when the system was in initialized stage. It is easy to trigger this and every time it happens.

Can you extract a simple (and, thus, shareable) test case from that?

>
> Q: the mutex shared between multiple process or just between threads of the same process?
> HD: The mutex shared only in one process with multi-tasks.
>
> Q:Maybe something analogously was needed for native as well. And then you could look at what happened in 3.x mutex-wise to check if you are not missing a conceptual fix in 2.6.
> HD: I will check the commit message. I compared 2.6.4 version with 2.6.5 version. It seems that the code are same in mutex(User space mutex).

Yes, the mutex patch in 2.6.5 was targeting only the posix skin. I didn't look into detail, but maybe that fix should have been applied to the native implementation as well. The problem is that at this point development moved on to 3.x, and there is only one implementation (which received such a fix as well).

>
> Q:You seem to look at the wrong data structure. You need to examine RT_MUTEX_PLACEHOLDER fields.
> HD: The data structure which I got is RT_MUTEX_PLACEHOLDER fields. I attached the code as below.
>
> typedef struct rt_mutex_placeholder {
>
>          xnhandle_t opaque;
>
> #ifdef CONFIG_XENO_FASTSYNCH
>          xnarch_atomic_t *fastlock;
>
>          int lockcnt;
> #endif /* CONFIG_XENO_FASTSYNCH */
>
> } RT_MUTEX_PLACEHOLDER;

See my remark above in lockcnt.

Jan

>
> -----Original Message-----
> From: Jan Kiszka <jan.kiszka@siemens.com>
> Sent: Friday, August 23, 2019 2:48 AM
> To: DIAO, Hanson (DI PA CI RC R&D SW2) <hanson.diao@siemens.com>; xenomai@xenomai.org
> Subject: Re: A potential Xenomai Mutex issue
>
> On 22.08.19 20:42, DIAO, Hanson via Xenomai wrote:
>> Hi all,
>>
>>
>>
>> I hope you are doing well. Currently I was working on a critical deadlock issue with Xenomail Library(version 2.6.4). I found that for the Xenomai lock count is not reliable after we called rt_mutex_release. I print the following message to you. I hope some developer can help me fix this issue. I know that this version is EOL, but we still use this old version. Thank you so much.
>>
>
> This is on a ARMv7 multicore target, right? Are you already able to reproduce the issue reliably, possibly in a synthetic environment? Or does your whole stack have to run on the target for a long time to trigger this? Is the mutex shared between multiple process or just between threads of the same process?
>
> Next point: You are on 2.6.4 while the last release was 2.6.5. It contained e.g.
> 8047147aff9d (posix/mutex: handle recursion count completely in user-space).
>
>>
>>
>> Issue 1:
>>
>> Before Mutex Lock Mutext addr = 0xb7c059e8,count = 0, owner = 0     This message show the status before rt_mutex_acquire.
>>
>> After Mutex Lock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd   This message show the status after calling rt_mutex_acquire.     Everything is right for the rt_mutex_acquire in this scenario.
>>
>>
>>
>> Before Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 2bd   This message show the status before rt_mutex_release.
>>
>> After Mutex unLock Mutext addr = 0xb7c059e8,count = 1, owner = 0          This message show the status after rt_mutex_release. It seems that the lock count is not correct after call rt_mutex_release.
>>
>
>
>>
>>
>> Issue 2:
>>
>> When our task is call recursive lock. The mutex lock count should more than 1, but the lock count is still 1.
>>
>>
>>
>> For the issue 1, I guess that there are something wrong in the release function. I highlighted the code. I am not sure if it is the root cause.
>>
>
> Don't use HTML emails on public lists. They often get filtered, at latest on receiver side.
>
> Jan
>
>>
>>
>> int rt_mutex_release(RT_MUTEX *mutex)
>>
>> {
>>
>> #ifdef CONFIG_XENO_FASTSYNCH
>>
>>           unsigned long status;
>>
>>           xnhandle_t cur;
>>
>>
>>
>>           cur = xeno_get_current();
>>
>>           if (cur == XN_NO_HANDLE)
>>
>>                   return -EPERM;
>>
>>
>>
>>           status = xeno_get_current_mode();
>>
>>           if (unlikely(status & XNOTHER))
>>
>>                   /* See rt_mutex_acquire_inner() */
>>
>>                   goto do_syscall;
>>
>>
>>
>>           if (unlikely(xnsynch_fast_owner_check(mutex->fastlock, cur)
>> != 0))
>>
>>                   return -EPERM;
>>
>>
>>
>>           if (mutex->lockcnt > 1) {
>>
>>                   mutex->lockcnt--;
>>
>>                   return 0;
>>
>>           }
>>
>>
>>
>>           if (likely(xnsynch_fast_release(mutex->fastlock, cur)))
>>
>>           {
>>
>>                   return 0;
>>
>>           }
>>
>> do_syscall:
>>
>> #endif /* CONFIG_XENO_FASTSYNCH */
>>
>>
>>
>>           return XENOMAI_SKINCALL1(__native_muxid,
>> __native_mutex_release, mutex);
>>
>> }
>>
>>
>>
>>
>>
>>
>>
>> For the Mutex lock function, I am so confused with the following comments which I highlighted as below. I am not sure if it supports the recursive lock.
>>
>> static int rt_mutex_acquire_inner(RT_MUTEX *mutex, RTIME timeout,
>> xntmode_t mode)
>>
>> {
>>
>>           int err;
>>
>> #ifdef CONFIG_XENO_FASTSYNCH
>>
>>           unsigned long status;
>>
>>           xnhandle_t cur;
>>
>>
>>
>>           cur = xeno_get_current();
>>
>>           if (cur == XN_NO_HANDLE)
>>
>>                   return -EPERM;
>>
>>
>>
>>           /*
>>
>>            * We track resource ownership for non real-time shadows in
>>
>>            * order to handle the auto-relax feature, so we must always
>>
>>            * obtain them via a syscall.
>>
>>            */
>>
>>           status = xeno_get_current_mode();
>>
>>           if (unlikely(status & XNOTHER))
>>
>>                   goto do_syscall;
>>
>>
>>
>>           if (likely(!(status & XNRELAX))) {
>>
>>                   err = xnsynch_fast_acquire(mutex->fastlock, cur);
>>
>>                   if (likely(!err)) {
>>
>>                           mutex->lockcnt = 1;
>>
>>                           return 0;
>>
>>                   }
>>
>>
>>
>>                   if (err == -EBUSY) {
>>
>>                           if (mutex->lockcnt == UINT_MAX)
>>
>>                                   return -EAGAIN;
>>
>>
>>
>>                           mutex->lockcnt++;
>>
>>                           return 0;
>>
>>                   }
>>
>>
>>
>>                   if (timeout == TM_NONBLOCK && mode == XN_RELATIVE)
>>
>>                           return -EWOULDBLOCK;
>>
>>           } else if (xnsynch_fast_owner_check(mutex->fastlock, cur) ==
>> 0) {
>>
>>                   /*
>>
>>                    * The application is buggy as it jumped to secondary
>> mode
>>
>>                    * while holding the mutex. Nevertheless, we have to
>> keep the
>>
>>                    * mutex state consistent.
>>
>>                    *
>>
>>                    * We make no efforts to migrate or warn here. There
>> is
>>
>>                    * XENO_DEBUG(SYNCH_RELAX) to catch such bugs.
>>
>>                    */
>>
>>                   if (mutex->lockcnt == UINT_MAX)
>>
>>                           return -EAGAIN;
>>
>>
>>
>>                   mutex->lockcnt++;
>>
>>                   return 0;
>>
>>           }
>>
>> do_syscall:
>>
>> #endif /* CONFIG_XENO_FASTSYNCH */
>>
>>
>>
>>           err = XENOMAI_SKINCALL3(__native_muxid,
>>
>>                                   __native_mutex_acquire, mutex, mode,
>> &timeout);
>>
>>
>>
>> #ifdef CONFIG_XENO_FASTSYNCH
>>
>>           if (!err)
>>
>>                   mutex->lockcnt = 1;
>>
>> #endif /* CONFIG_XENO_FASTSYNCH */
>>
>>
>>
>>           return err;
>>
>> }
>>
>>
>>
>>
>>
>
> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
>

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A potential Xenomai Mutex issue
  2019-08-23 14:29       ` DIAO, Hanson
@ 2019-08-23 15:23         ` Jan Kiszka
  2019-08-23 15:49           ` DIAO, Hanson
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2019-08-23 15:23 UTC (permalink / raw)
  To: DIAO, Hanson (DI PA CI RC R&D SW2); +Cc: xenomai

On 23.08.19 16:29, DIAO, Hanson (DI PA CI RC R&D SW2) wrote:
> Hi Jan,
> 
> I attached my code here. This is only Lock function. Unlock function is similar.
> 
> 
>          printf("Before Mutex Lock Mutext addr = %p,count = %d, owner = %x\n",
>                  mpMutex,
>                  mpMutex->lockcnt,
>                  xnarch_atomic_get(mpMutex->fastlock));
>          int err = rt_mutex_acquire( mpMutex, (RTIME)TM_INFINITE );
>          if ( err &&
>                   // During boot-up and shutdown we run single-threaded
>                   // so there is no need to lock an semaphore.
>                   !(rc_system_state() != SYSTEM_RUNNING && err == -EPERM) )
>          {
>                  rc_xeno_log( LOG_ERROR , "rt_mutex_acquire" , err);
>          }
>          printf("After Mutex Lock Mutext addr = %p,count = %d, owner = %x\n",
>                  mpMutex,
>                  mpMutex->lockcnt,
>                  xnarch_atomic_get(mpMutex->fastlock));

OK, now I understand the relation between "count" and "lockcnt". Thanks.

Again, for the deadlock case, can you reproduce it with synthetic patterns and 
share them?

> 
> 
> For the issue 2 the test case is very simple. The lock sequence is as below. After ReadReg function I checked the lockcnt. The lockcnt is 1.
> Int writeReg()
> {
>       SMILock();
>       ReadReg();

So you are reading lockcnt here? Then 1 is obviously the expected value. Is 
owner (fastlock) 0 here?

>       ......
>       SMIUnlock();
> }
> 
> Int ReadReg()
> {
>       SMILock();

If you read it here, it should be 2 in a recursive case.

Jan

>       ........
>       ......
>       SMIUnlock();
> }
> 

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: A potential Xenomai Mutex issue
  2019-08-23 15:23         ` Jan Kiszka
@ 2019-08-23 15:49           ` DIAO, Hanson
  2019-08-23 16:18             ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: DIAO, Hanson @ 2019-08-23 15:49 UTC (permalink / raw)
  To: jan.kiszka; +Cc: xenomai

Hi Jan,

Thank you for your reply. Please see my following comments. Thank you so much.


For the issue 2 the test case is very simple. The lock sequence is as below. After ReadReg function I checked the lockcnt. The lockcnt is 1.
Int writeReg()
{
     SMILock();
     ReadReg();
     ......
     SMIUnlock();
}

Int ReadReg()
{
     SMILock();
     /*  Check the lockcnt here. It is still 1, should be 2 */
     ........
     ......
     SMIUnlock();
}

-----Original Message-----
From: Jan Kiszka <jan.kiszka@siemens.com>
Sent: Friday, August 23, 2019 11:23 AM
To: DIAO, Hanson (DI PA CI RC R&D SW2) <hanson.diao@siemens.com>
Cc: xenomai@xenomai.org
Subject: Re: A potential Xenomai Mutex issue

On 23.08.19 16:29, DIAO, Hanson (DI PA CI RC R&D SW2) wrote:
> Hi Jan,
>
> I attached my code here. This is only Lock function. Unlock function is similar.
>
>
>          printf("Before Mutex Lock Mutext addr = %p,count = %d, owner = %x\n",
>                  mpMutex,
>                  mpMutex->lockcnt,
>                  xnarch_atomic_get(mpMutex->fastlock));
>          int err = rt_mutex_acquire( mpMutex, (RTIME)TM_INFINITE );
>          if ( err &&
>                   // During boot-up and shutdown we run single-threaded
>                   // so there is no need to lock an semaphore.
>                   !(rc_system_state() != SYSTEM_RUNNING && err == -EPERM) )
>          {
>                  rc_xeno_log( LOG_ERROR , "rt_mutex_acquire" , err);
>          }
>          printf("After Mutex Lock Mutext addr = %p,count = %d, owner = %x\n",
>                  mpMutex,
>                  mpMutex->lockcnt,
>                  xnarch_atomic_get(mpMutex->fastlock));

OK, now I understand the relation between "count" and "lockcnt". Thanks.

Again, for the deadlock case, can you reproduce it with synthetic patterns and share them?

>
>
> For the issue 2 the test case is very simple. The lock sequence is as below. After ReadReg function I checked the lockcnt. The lockcnt is 1.
> Int writeReg()
> {
>       SMILock();
>       ReadReg();

So you are reading lockcnt here? Then 1 is obviously the expected value. Is owner (fastlock) 0 here?

>       ......
>       SMIUnlock();
> }
>
> Int ReadReg()
> {
>       SMILock();

If you read it here, it should be 2 in a recursive case.

Jan

>       ........
>       ......
>       SMIUnlock();
> }
>

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A potential Xenomai Mutex issue
  2019-08-23 15:49           ` DIAO, Hanson
@ 2019-08-23 16:18             ` Jan Kiszka
  2019-08-23 17:27               ` DIAO, Hanson
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2019-08-23 16:18 UTC (permalink / raw)
  To: DIAO, Hanson (DI PA CI RC R&D SW2); +Cc: xenomai

On 23.08.19 17:49, DIAO, Hanson (DI PA CI RC R&D SW2) wrote:
> Hi Jan,
> 
> Thank you for your reply. Please see my following comments. Thank you so much.
> 
> 
> For the issue 2 the test case is very simple. The lock sequence is as below. After ReadReg function I checked the lockcnt. The lockcnt is 1.
> Int writeReg()
> {
>       SMILock();
>       ReadReg();
>       ......
>       SMIUnlock();
> }
> 
> Int ReadReg()
> {
>       SMILock();
>       /*  Check the lockcnt here. It is still 1, should be 2 */
>       ........
>       ......
>       SMIUnlock();
> }

Is the thread doing any migrations to secondary mode between the entry of 
writeReg and the checking of lockcnt?

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: A potential Xenomai Mutex issue
  2019-08-23 16:18             ` Jan Kiszka
@ 2019-08-23 17:27               ` DIAO, Hanson
  2019-08-23 18:17                 ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: DIAO, Hanson @ 2019-08-23 17:27 UTC (permalink / raw)
  To: jan.kiszka; +Cc: xenomai

No, the thread was not doing any migrations to secondary mode.

-----Original Message-----
From: Jan Kiszka <jan.kiszka@siemens.com>
Sent: Friday, August 23, 2019 12:19 PM
To: DIAO, Hanson (DI PA CI RC R&D SW2) <hanson.diao@siemens.com>
Cc: xenomai@xenomai.org
Subject: Re: A potential Xenomai Mutex issue

On 23.08.19 17:49, DIAO, Hanson (DI PA CI RC R&D SW2) wrote:
> Hi Jan,
>
> Thank you for your reply. Please see my following comments. Thank you so much.
>
>
> For the issue 2 the test case is very simple. The lock sequence is as below. After ReadReg function I checked the lockcnt. The lockcnt is 1.
> Int writeReg()
> {
>       SMILock();
>       ReadReg();
>       ......
>       SMIUnlock();
> }
>
> Int ReadReg()
> {
>       SMILock();
>       /*  Check the lockcnt here. It is still 1, should be 2 */
>       ........
>       ......
>       SMIUnlock();
> }

Is the thread doing any migrations to secondary mode between the entry of writeReg and the checking of lockcnt?

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A potential Xenomai Mutex issue
  2019-08-23 17:27               ` DIAO, Hanson
@ 2019-08-23 18:17                 ` Jan Kiszka
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kiszka @ 2019-08-23 18:17 UTC (permalink / raw)
  To: DIAO, Hanson (DI PA CI RC R&D SW2); +Cc: xenomai

On 23.08.19 19:27, DIAO, Hanson (DI PA CI RC R&D SW2) wrote:
> No, the thread was not doing any migrations to secondary mode.
> 

Then I would suggest to use a debugger in order to step through this fairly 
simple, nicely single-threaded case. That should reveal if you are really taking 
the same lock or why rt_mutex_lock decides to not increment lockcnt.

Jan

> -----Original Message-----
> From: Jan Kiszka <jan.kiszka@siemens.com>
> Sent: Friday, August 23, 2019 12:19 PM
> To: DIAO, Hanson (DI PA CI RC R&D SW2) <hanson.diao@siemens.com>
> Cc: xenomai@xenomai.org
> Subject: Re: A potential Xenomai Mutex issue
> 
> On 23.08.19 17:49, DIAO, Hanson (DI PA CI RC R&D SW2) wrote:
>> Hi Jan,
>>
>> Thank you for your reply. Please see my following comments. Thank you so much.
>>
>>
>> For the issue 2 the test case is very simple. The lock sequence is as below. After ReadReg function I checked the lockcnt. The lockcnt is 1.
>> Int writeReg()
>> {
>>        SMILock();
>>        ReadReg();
>>        ......
>>        SMIUnlock();
>> }
>>
>> Int ReadReg()
>> {
>>        SMILock();
>>        /*  Check the lockcnt here. It is still 1, should be 2 */
>>        ........
>>        ......
>>        SMIUnlock();
>> }
> 
> Is the thread doing any migrations to secondary mode between the entry of writeReg and the checking of lockcnt?
> 
> Jan
> 
> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
> 

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-08-23 18:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-22 18:42 A potential Xenomai Mutex issue DIAO, Hanson
2019-08-23  6:47 ` Jan Kiszka
2019-08-23 14:02   ` DIAO, Hanson
2019-08-23 14:16     ` Jan Kiszka
2019-08-23 14:29       ` DIAO, Hanson
2019-08-23 15:23         ` Jan Kiszka
2019-08-23 15:49           ` DIAO, Hanson
2019-08-23 16:18             ` Jan Kiszka
2019-08-23 17:27               ` DIAO, Hanson
2019-08-23 18:17                 ` Jan Kiszka
2019-08-23 14:18     ` DIAO, Hanson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.