All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] fault in ppd_lookup_inner
@ 2015-03-25  8:21 Paolo Minazzi
  2015-03-25  8:36 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Minazzi @ 2015-03-25  8:21 UTC (permalink / raw)
  To: xenomai

Hello all,
I'm stressing an arm IMX6 boardwith
- kernel 3.0.35
- I-pipe 1.18-13
- xenomai 2.6.3

With some hours of stress-test I can see the fault at the bottomof the 
email.
I always see this fault, not only on one board (so I think it is not an 
hardware problem).
The PC is always at ppd_lookup_inner+0xec/0x160.

I have checked xenomai 2.6.4 and tried to add the fixes but there are 
not changes to that part of code.

STRESS-TEST (called test-lib)
=======================================================================================
#include <stdio.h>
#include <sys/mman.h>
#include <native/task.h>
#include <dlfcn.h>

#define N 20

int main(int argc, char *argv[])
{
         int i,pid, status, cnt = 0;

         while(1)
         {
                 for (i=0; i<N; i++)
                 {
                         pid = fork();
                         if (!pid)
                         {
                                 char name[32];
                                 mlockall(MCL_CURRENT | MCL_FUTURE);

                                 sprintf(name, "test-lib-%d", i);
                                 rt_task_shadow(NULL, name, 0, 0);

                                 return 
(dlopen("/lib/libvncserver.so.0.0.0", 1) == 0);
                         }
                 }
                 for (i=0; i<N; i++)
                 {
                         wait(&status);
                 }
                 cnt+=N;
                 printf("cnt=%d\n", cnt);
                 fflush(stdout);
         }
}

FAULT
=======================================================================================
Unable to handle kernel paging request at virtual address 4032022c
pgd = 80004000
[4032022c] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT
Modules linked in: mchpar1xxx imx6xendriverCAN_1 imx6xendriverCAN_0
CPU: 0    Not tainted  (3.0.35-2666-gbdde708 #564)
PC is at ppd_lookup_inner+0xec/0x160
LR is at ppd_lookup+0x20/0x30
pc : [<800ca744>]    lr : [<800ca7d8>]    psr: 80000013
sp : 84d8be50  ip : 8b97add8  fp : 00000001
r10: 9fde0300  r9 : 80923998  r8 : 00000b30
r7 : 9ff6e000  r6 : 8b97adb8  r5 : 9fde0300  r4 : 9ff6e024
r3 : 4032022c  r2 : 84d8be68  r1 : 84d8be60  r0 : 4032022c
Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM Segment user
Control: 10c53c7d  Table: 2fa40059  DAC: 00000015
Process test-lib-17 (pid: 7787, stack limit = 0x84d8a2e8)
Stack: (0x84d8be50 to 0x84d8c000)
be40:                                     809239d0 809518a0 809518a0 
80923928
be60: 80141ea0 9ff6e024 00000000 9fde0300 80141ea0 800cbb80 80141ea0 
8014eeac
be80: 80946d50 85cd3790 00000000 85cd3790 84ba8d80 9fde0300 84d8bee0 
80141ea0
bea0: 00000059 00000071 00000000 809239d0 809518a0 809518a0 80923928 
00000b30
bec0: 80923998 80923998 00000001 800ac454 20000013 00000000 00008000 
00000000
bee0: 0000000f 00000044 809518a0 9fde0300 ffff7fff ffffffff 00000000 
84ba8d80
bf00: 9fde0300 9fde0300 9fde0334 800442e8 84d8a000 84d8a000 00000000 
8006eb9c
bf20: 84ba8d80 9fde0300 84d8a000 80072db8 84d8a000 84ba8d80 000000f8 
00000000
bf40: 84d8a000 84ba8d80 000000f8 800744d4 00000200 00000001 00000009 
0000002c
bf60: 809518a0 84d8bfb0 fffffdff 00000000 84d8a000 2acde760 000000f8 
800442e8
bf80: 84d8a000 00000000 00000000 80074de8 00000000 000700de 2acde760 
80074ea8
bfa0: 00000000 80044100 000700de 2acde760 00000000 000700ca 2abed4c0 
00000000
bfc0: 000700de 2acde760 2acde760 000000f8 00000000 00000000 2ab33000 
00000000
bfe0: 000000f8 7ee16c6c 2ac6ffc3 2ac19276 60000030 00000000 00000000 
00000000
[<800ca744>] (ppd_lookup_inner+0xec/0x160) from [<80141ea0>] 
(remove_vma+0x54/0x6c)
Code: e592c000 e153000c 9a00000d e1a03000 (e5930000)
---[ end trace c1c53546cece2a33 ]---
Fixing recursive fault but reboot is needed!
=======================================================================================

The system continue to work, is not freezed.
Have you got an idea ?
Thanks for your time

Paolo



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai] fault in ppd_lookup_inner
  2015-03-25  8:21 [Xenomai] fault in ppd_lookup_inner Paolo Minazzi
@ 2015-03-25  8:36 ` Gilles Chanteperdrix
  2015-03-25  8:59   ` Paolo Minazzi
  0 siblings, 1 reply; 10+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-25  8:36 UTC (permalink / raw)
  To: Paolo Minazzi; +Cc: xenomai

On Wed, Mar 25, 2015 at 09:21:24AM +0100, Paolo Minazzi wrote:
> Hello all,
> I'm stressing an arm IMX6 boardwith
> - kernel 3.0.35
> - I-pipe 1.18-13
> - xenomai 2.6.3

Do you observe the same behaviour with Xenomai 2.6.4 ?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai] fault in ppd_lookup_inner
  2015-03-25  8:36 ` Gilles Chanteperdrix
@ 2015-03-25  8:59   ` Paolo Minazzi
  2015-03-25  9:03     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Minazzi @ 2015-03-25  8:59 UTC (permalink / raw)
  To: xenomai

Il 25/03/2015 09:36, Gilles Chanteperdrix ha scritto:
> On Wed, Mar 25, 2015 at 09:21:24AM +0100, Paolo Minazzi wrote:
>> Hello all,
>> I'm stressing an arm IMX6 boardwith
>> - kernel 3.0.35
>> - I-pipe 1.18-13
>> - xenomai 2.6.3
> Do you observe the same behaviour with Xenomai 2.6.4 ?
>
I have added all fix of 2.6.4 to my 2.6.3 (seeing changelog) but I have 
the same behaviour.
It is not easy see the problem.
Paolo





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai] fault in ppd_lookup_inner
  2015-03-25  8:59   ` Paolo Minazzi
@ 2015-03-25  9:03     ` Gilles Chanteperdrix
  2015-03-25  9:07       ` Gilles Chanteperdrix
  2015-03-25  9:11       ` Paolo Minazzi
  0 siblings, 2 replies; 10+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-25  9:03 UTC (permalink / raw)
  To: Paolo Minazzi; +Cc: xenomai

On Wed, Mar 25, 2015 at 09:59:52AM +0100, Paolo Minazzi wrote:
> Il 25/03/2015 09:36, Gilles Chanteperdrix ha scritto:
> >On Wed, Mar 25, 2015 at 09:21:24AM +0100, Paolo Minazzi wrote:
> >>Hello all,
> >>I'm stressing an arm IMX6 boardwith
> >>- kernel 3.0.35
> >>- I-pipe 1.18-13
> >>- xenomai 2.6.3
> >Do you observe the same behaviour with Xenomai 2.6.4 ?
> >
> I have added all fix of 2.6.4 to my 2.6.3 (seeing changelog) but I have the
> same behaviour.
> It is not easy see the problem.

Why not simply testing 2.6.4, with everything, not only what appear
as fixes?

Also, do you have the same problem if you dlopen something more
innocuous than libvncserver ?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai] fault in ppd_lookup_inner
  2015-03-25  9:03     ` Gilles Chanteperdrix
@ 2015-03-25  9:07       ` Gilles Chanteperdrix
  2015-03-25  9:11       ` Paolo Minazzi
  1 sibling, 0 replies; 10+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-25  9:07 UTC (permalink / raw)
  To: Paolo Minazzi; +Cc: xenomai

On Wed, Mar 25, 2015 at 10:03:54AM +0100, Gilles Chanteperdrix wrote:
> On Wed, Mar 25, 2015 at 09:59:52AM +0100, Paolo Minazzi wrote:
> > Il 25/03/2015 09:36, Gilles Chanteperdrix ha scritto:
> > >On Wed, Mar 25, 2015 at 09:21:24AM +0100, Paolo Minazzi wrote:
> > >>Hello all,
> > >>I'm stressing an arm IMX6 boardwith
> > >>- kernel 3.0.35
> > >>- I-pipe 1.18-13
> > >>- xenomai 2.6.3
> > >Do you observe the same behaviour with Xenomai 2.6.4 ?
> > >
> > I have added all fix of 2.6.4 to my 2.6.3 (seeing changelog) but I have the
> > same behaviour.
> > It is not easy see the problem.
> 
> Why not simply testing 2.6.4, with everything, not only what appear
> as fixes?
> 
> Also, do you have the same problem if you dlopen something more
> innocuous than libvncserver ?

Also, please upgrade the I-pipe patch to the latest contained in
Xenomai 2.6.4. Upgrading to a mainline version would also help
finding out if the problem is in the I-pipe patch for 3.0.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai] fault in ppd_lookup_inner
  2015-03-25  9:03     ` Gilles Chanteperdrix
  2015-03-25  9:07       ` Gilles Chanteperdrix
@ 2015-03-25  9:11       ` Paolo Minazzi
  2015-03-25  9:20         ` Gilles Chanteperdrix
  1 sibling, 1 reply; 10+ messages in thread
From: Paolo Minazzi @ 2015-03-25  9:11 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Il 25/03/2015 10:03, Gilles Chanteperdrix ha scritto:
> On Wed, Mar 25, 2015 at 09:59:52AM +0100, Paolo Minazzi wrote:
>> Il 25/03/2015 09:36, Gilles Chanteperdrix ha scritto:
>>> On Wed, Mar 25, 2015 at 09:21:24AM +0100, Paolo Minazzi wrote:
>>>> Hello all,
>>>> I'm stressing an arm IMX6 boardwith
>>>> - kernel 3.0.35
>>>> - I-pipe 1.18-13
>>>> - xenomai 2.6.3
>>> Do you observe the same behaviour with Xenomai 2.6.4 ?
>>>
>> I have added all fix of 2.6.4 to my 2.6.3 (seeing changelog) but I have the
>> same behaviour.
>> It is not easy see the problem.
> Why not simply testing 2.6.4, with everything, not only what appear
> as fixes?
>
> Also, do you have the same problem if you dlopen something more
> innocuous than libvncserver ?
>
I understand ... I port the fix to avoid compatililityproblem.
I made only ksrc changes, so my user space is 100% compatible.
I understand that you do not agree with me because you prefer a clean 2.6.4.
I can try an other library. Do you mean a smaller library?
This is a strange test. I'm trying it because in the past I have had 
some memory corruption that make system instable.
I would like to be sure about it.
Paolo




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai] fault in ppd_lookup_inner
  2015-03-25  9:11       ` Paolo Minazzi
@ 2015-03-25  9:20         ` Gilles Chanteperdrix
  2015-03-25  9:51           ` Paolo Minazzi
  0 siblings, 1 reply; 10+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-25  9:20 UTC (permalink / raw)
  To: Paolo Minazzi; +Cc: xenomai

On Wed, Mar 25, 2015 at 10:11:13AM +0100, Paolo Minazzi wrote:
> Il 25/03/2015 10:03, Gilles Chanteperdrix ha scritto:
> >On Wed, Mar 25, 2015 at 09:59:52AM +0100, Paolo Minazzi wrote:
> >>Il 25/03/2015 09:36, Gilles Chanteperdrix ha scritto:
> >>>On Wed, Mar 25, 2015 at 09:21:24AM +0100, Paolo Minazzi wrote:
> >>>>Hello all,
> >>>>I'm stressing an arm IMX6 boardwith
> >>>>- kernel 3.0.35
> >>>>- I-pipe 1.18-13
> >>>>- xenomai 2.6.3
> >>>Do you observe the same behaviour with Xenomai 2.6.4 ?
> >>>
> >>I have added all fix of 2.6.4 to my 2.6.3 (seeing changelog) but I have the
> >>same behaviour.
> >>It is not easy see the problem.
> >Why not simply testing 2.6.4, with everything, not only what appear
> >as fixes?
> >
> >Also, do you have the same problem if you dlopen something more
> >innocuous than libvncserver ?
> >
> I understand ... I port the fix to avoid compatililityproblem.
> I made only ksrc changes, so my user space is 100% compatible.

There is no compatibility problem between versions of a stable
branch. 2.6.4 is compatible with 2.6.0, 2.6.1, 2.6.2, 2.6.3. It is
even ABI compatible, you do not even have to recompile your
applications.

> I understand that you do not agree with me because you prefer a clean 2.6.4.
> I can try an other library. Do you mean a smaller library?

Any library that everybody has on his system, to allow reproducing
the issue. You could also try replacing the dlopen with a nanosleep.

> This is a strange test. I'm trying it because in the past I have had some
> memory corruption that make system instable.
> I would like to be sure about it.

I do not doubt that you observed an issue. I would simply like to be
sure that it has not already been fixed in Xenomai or the I-pipe
patch.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai] fault in ppd_lookup_inner
  2015-03-25  9:20         ` Gilles Chanteperdrix
@ 2015-03-25  9:51           ` Paolo Minazzi
  2015-03-25 13:28             ` Philippe Gerum
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Minazzi @ 2015-03-25  9:51 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

>> I understand ... I port the fix to avoid compatililityproblem.
>> I made only ksrc changes, so my user space is 100% compatible.
> There is no compatibility problem between versions of a stable
> branch. 2.6.4 is compatible with 2.6.0, 2.6.1, 2.6.2, 2.6.3. It is
> even ABI compatible, you do not even have to recompile your
> applications.
>
>> I understand that you do not agree with me because you prefer a clean 2.6.4.
>> I can try an other library. Do you mean a smaller library?
> Any library that everybody has on his system, to allow reproducing
> the issue. You could also try replacing the dlopen with a nanosleep.
>
>> This is a strange test. I'm trying it because in the past I have had some
>> memory corruption that make system instable.
>> I would like to be sure about it.
> I do not doubt that you observed an issue. I would simply like to be
> sure that it has not already been fixed in Xenomai or the I-pipe
> patch.
>
I will make other tests.
If I discover something I will write on mailing list.
Thanks again,
Paolo



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai] fault in ppd_lookup_inner
  2015-03-25  9:51           ` Paolo Minazzi
@ 2015-03-25 13:28             ` Philippe Gerum
  2015-03-25 14:23               ` Paolo Minazzi
  0 siblings, 1 reply; 10+ messages in thread
From: Philippe Gerum @ 2015-03-25 13:28 UTC (permalink / raw)
  To: Paolo Minazzi, Gilles Chanteperdrix; +Cc: xenomai

On 03/25/2015 10:51 AM, Paolo Minazzi wrote:
>>> I understand ... I port the fix to avoid compatililityproblem.
>>> I made only ksrc changes, so my user space is 100% compatible.
>> There is no compatibility problem between versions of a stable
>> branch. 2.6.4 is compatible with 2.6.0, 2.6.1, 2.6.2, 2.6.3. It is
>> even ABI compatible, you do not even have to recompile your
>> applications.
>>
>>> I understand that you do not agree with me because you prefer a clean
>>> 2.6.4.
>>> I can try an other library. Do you mean a smaller library?
>> Any library that everybody has on his system, to allow reproducing
>> the issue. You could also try replacing the dlopen with a nanosleep.
>>
>>> This is a strange test. I'm trying it because in the past I have had
>>> some
>>> memory corruption that make system instable.
>>> I would like to be sure about it.
>> I do not doubt that you observed an issue. I would simply like to be
>> sure that it has not already been fixed in Xenomai or the I-pipe
>> patch.
>>
> I will make other tests.
> If I discover something I will write on mailing list.

3.0.35-fsl for imx6 has multiple issues of its own, particularly in the
SMP case. In addition, the pipeline patch over this one belongs to the
legacy series, which also has bugs that were fixed in recent I-pipe
series for 3.x kernels.

Typically, the way process cleanup events are dealt with in the pipeline
has been fixed to close a race. Running with CONFIG_DEBUG_PAGEALLOC
enabled might reveal some of these issues, but not all of them.

I tried your test code on imx6q (3.18.2) and x86_64 (3.14.33) without
any issue after > 100,000 iterations, dlopening libm instead of
libvncserver.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai] fault in ppd_lookup_inner
  2015-03-25 13:28             ` Philippe Gerum
@ 2015-03-25 14:23               ` Paolo Minazzi
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Minazzi @ 2015-03-25 14:23 UTC (permalink / raw)
  To: Philippe Gerum, Gilles Chanteperdrix; +Cc: xenomai

Il 25/03/2015 14:28, Philippe Gerum ha scritto:
> On 03/25/2015 10:51 AM, Paolo Minazzi wrote:
>>>> I understand ... I port the fix to avoid compatililityproblem.
>>>> I made only ksrc changes, so my user space is 100% compatible.
>>> There is no compatibility problem between versions of a stable
>>> branch. 2.6.4 is compatible with 2.6.0, 2.6.1, 2.6.2, 2.6.3. It is
>>> even ABI compatible, you do not even have to recompile your
>>> applications.
>>>
>>>> I understand that you do not agree with me because you prefer a clean
>>>> 2.6.4.
>>>> I can try an other library. Do you mean a smaller library?
>>> Any library that everybody has on his system, to allow reproducing
>>> the issue. You could also try replacing the dlopen with a nanosleep.
>>>
>>>> This is a strange test. I'm trying it because in the past I have had
>>>> some
>>>> memory corruption that make system instable.
>>>> I would like to be sure about it.
>>> I do not doubt that you observed an issue. I would simply like to be
>>> sure that it has not already been fixed in Xenomai or the I-pipe
>>> patch.
>>>
>> I will make other tests.
>> If I discover something I will write on mailing list.
> 3.0.35-fsl for imx6 has multiple issues of its own, particularly in the
> SMP case. In addition, the pipeline patch over this one belongs to the
> legacy series, which also has bugs that were fixed in recent I-pipe
> series for 3.x kernels.
>
> Typically, the way process cleanup events are dealt with in the pipeline
> has been fixed to close a race. Running with CONFIG_DEBUG_PAGEALLOC
> enabled might reveal some of these issues, but not all of them.
>
> I tried your test code on imx6q (3.18.2) and x86_64 (3.14.33) without
> any issue after > 100,000 iterations, dlopening libm instead of
> libvncserver.
>
My board is not SMP.
I realized that ipipe patch for imx6 for kernel 3.0.35 is not so stable 
and after some tests
it is possible see memory corruption/system instable.

Porting some part of the ipipe for 3.5 to my 3.0.35 the system become 
much more stable, exactly
[Xenomai][PATCH 1/2] ipipe: Rework and simplify __ipipe_pin_vma
[Xenomai][PATCH 2/2] ipipe: Fault in locked vmas after changing the 
protection flags

To be precise I tried my example (dlopen) with other 3 tests in parallel:
- canbus loop between can0 and can1 at 1Mbit using a realtime version of 
flexcan driver.
- test that continue to do malloc/free (realtime task)
- 10 realtime task that do nothing

I realize that the system is very stressed and it will never used in 
this way.
But it is a way to understand the level of stability.

After all, after 24 hours, 7 boards continue to run without any problem.
Only one board stopped with ppd_lookup_inner fault.

Maybe the external interrupt (the canbus driver) creates some problem.
It generates more interrupts in 1 milliseconds.
I think that rtdm is not tested as tasks linked to time. But this is my 
opion and
on this I could be wrong.
The driver is very simple, so I can say that it does not introduces bug.

I will try to do not run the canbus test to see if this bug vanish.

Thanks
Paolo



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-03-25 14:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-25  8:21 [Xenomai] fault in ppd_lookup_inner Paolo Minazzi
2015-03-25  8:36 ` Gilles Chanteperdrix
2015-03-25  8:59   ` Paolo Minazzi
2015-03-25  9:03     ` Gilles Chanteperdrix
2015-03-25  9:07       ` Gilles Chanteperdrix
2015-03-25  9:11       ` Paolo Minazzi
2015-03-25  9:20         ` Gilles Chanteperdrix
2015-03-25  9:51           ` Paolo Minazzi
2015-03-25 13:28             ` Philippe Gerum
2015-03-25 14:23               ` Paolo Minazzi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.