All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
@ 2011-12-23 17:33 Lennart Sorensen
  2011-12-23 18:17 ` Philippe Gerum
  2012-01-04 13:34 ` Philippe Gerum
  0 siblings, 2 replies; 12+ messages in thread
From: Lennart Sorensen @ 2011-12-23 17:33 UTC (permalink / raw)
  To: xenomai

After spending quite a while trying to explain how things like /bin/echo
could possibly segfault, I finally discovered that the new feature in
xenomai 2.6.0 (new when moving from 2.4.10 that is) of having preemptible
context switches is what is corrupting the state of random linux processes
once in a while.

After turning the option off, I haven't seen a single crash just like 2.4.10.

So something subtle is wrong with this option.

It appears to be most likely to occour (possibly only likely) when
xenomai is handling interrupts.

It seems that getting an interrupt in the middle of a context switch at
the wrong time corrupts the process that is being switched to or from
(no idea which it is).

Unless someone can think of a way to track down and fix this I would
certainly suggest making the option off by default instead of on.

With CONFIG_XENO_HW_UNLOCKED_SWITCH=n I don't have any problems anymore.

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
  2011-12-23 17:33 [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc Lennart Sorensen
@ 2011-12-23 18:17 ` Philippe Gerum
  2011-12-23 18:32   ` Lennart Sorensen
  2012-01-04 13:34 ` Philippe Gerum
  1 sibling, 1 reply; 12+ messages in thread
From: Philippe Gerum @ 2011-12-23 18:17 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: xenomai

On 12/23/2011 06:33 PM, Lennart Sorensen wrote:
> After spending quite a while trying to explain how things like /bin/echo
> could possibly segfault, I finally discovered that the new feature in
> xenomai 2.6.0 (new when moving from 2.4.10 that is) of having preemptible
> context switches is what is corrupting the state of random linux processes
> once in a while.
>
> After turning the option off, I haven't seen a single crash just like 2.4.10.
>
> So something subtle is wrong with this option.
>
> It appears to be most likely to occour (possibly only likely) when
> xenomai is handling interrupts.
>
> It seems that getting an interrupt in the middle of a context switch at
> the wrong time corrupts the process that is being switched to or from
> (no idea which it is).
>
> Unless someone can think of a way to track down and fix this I would
> certainly suggest making the option off by default instead of on.
>

Papering over a bug this way is certainly not an option.

> With CONFIG_XENO_HW_UNLOCKED_SWITCH=n I don't have any problems anymore.
>

Which kernel version, what ppc hardware?

-- 
Philippe.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
  2011-12-23 18:17 ` Philippe Gerum
@ 2011-12-23 18:32   ` Lennart Sorensen
  2011-12-23 20:08     ` Philippe Gerum
  0 siblings, 1 reply; 12+ messages in thread
From: Lennart Sorensen @ 2011-12-23 18:32 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On Fri, Dec 23, 2011 at 07:17:09PM +0100, Philippe Gerum wrote:
> Papering over a bug this way is certainly not an option.

Long term it certainly isn't.

> Which kernel version, what ppc hardware?

3.0.13, 3.0.9, 3.0.8.  mpc8360e.

xenomai 2.6.0 with ipipe 3.0.8-powerpc-2.13-04

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
  2011-12-23 18:32   ` Lennart Sorensen
@ 2011-12-23 20:08     ` Philippe Gerum
  2011-12-23 20:25       ` Lennart Sorensen
  0 siblings, 1 reply; 12+ messages in thread
From: Philippe Gerum @ 2011-12-23 20:08 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: xenomai

On 12/23/2011 07:32 PM, Lennart Sorensen wrote:
> On Fri, Dec 23, 2011 at 07:17:09PM +0100, Philippe Gerum wrote:
>> Papering over a bug this way is certainly not an option.
>
> Long term it certainly isn't.
>
>> Which kernel version, what ppc hardware?
>
> 3.0.13, 3.0.9, 3.0.8.  mpc8360e.
>
> xenomai 2.6.0 with ipipe 3.0.8-powerpc-2.13-04
>

Do you have a typical test scenario which triggers this bug?

-- 
Philippe.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
  2011-12-23 20:08     ` Philippe Gerum
@ 2011-12-23 20:25       ` Lennart Sorensen
  2011-12-23 21:48         ` Philippe Gerum
  0 siblings, 1 reply; 12+ messages in thread
From: Lennart Sorensen @ 2011-12-23 20:25 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On Fri, Dec 23, 2011 at 09:08:11PM +0100, Philippe Gerum wrote:
> Do you have a typical test scenario which triggers this bug?

It can take a couple of hours under pretty heavy load to get one
occourance.  But with preemptible context swiches off we haven't seen
any in a week.

For sure xenomai tasks are handling interrupts quite a lot at the time.

I wish we had a simple test case to show it, but it seems to require
triggering an interrupt in the middle of a context switch at exactly
the wrong place.

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
  2011-12-23 20:25       ` Lennart Sorensen
@ 2011-12-23 21:48         ` Philippe Gerum
  2011-12-23 21:55           ` Lennart Sorensen
  0 siblings, 1 reply; 12+ messages in thread
From: Philippe Gerum @ 2011-12-23 21:48 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: xenomai

On 12/23/2011 09:25 PM, Lennart Sorensen wrote:
> On Fri, Dec 23, 2011 at 09:08:11PM +0100, Philippe Gerum wrote:
>> Do you have a typical test scenario which triggers this bug?
>
> It can take a couple of hours under pretty heavy load to get one
> occourance.  But with preemptible context swiches off we haven't seen
> any in a week.
>
> For sure xenomai tasks are handling interrupts quite a lot at the time.
>
> I wish we had a simple test case to show it, but it seems to require
> triggering an interrupt in the middle of a context switch at exactly
> the wrong place.
>

Is it reproducible with the basic latency or cyclic tests if waiting for 
long enough? Running ltp in parallel would trigger a decent load, but 
sometimes two shell loops forking commands in the background are enough 
to trigger a variety of issues when something fragile exists in the mmu 
layer as modified by the I-Pipe.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
  2011-12-23 21:48         ` Philippe Gerum
@ 2011-12-23 21:55           ` Lennart Sorensen
  2011-12-23 21:58             ` Philippe Gerum
  0 siblings, 1 reply; 12+ messages in thread
From: Lennart Sorensen @ 2011-12-23 21:55 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On Fri, Dec 23, 2011 at 10:48:29PM +0100, Philippe Gerum wrote:
> Is it reproducible with the basic latency or cyclic tests if waiting
> for long enough? Running ltp in parallel would trigger a decent
> load, but sometimes two shell loops forking commands in the
> background are enough to trigger a variety of issues when something
> fragile exists in the mmu layer as modified by the I-Pipe.

Well we can try after I come back from vacation in a couple of weeks.

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
  2011-12-23 21:55           ` Lennart Sorensen
@ 2011-12-23 21:58             ` Philippe Gerum
  0 siblings, 0 replies; 12+ messages in thread
From: Philippe Gerum @ 2011-12-23 21:58 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: xenomai

On 12/23/2011 10:55 PM, Lennart Sorensen wrote:
> On Fri, Dec 23, 2011 at 10:48:29PM +0100, Philippe Gerum wrote:
>> Is it reproducible with the basic latency or cyclic tests if waiting
>> for long enough? Running ltp in parallel would trigger a decent
>> load, but sometimes two shell loops forking commands in the
>> background are enough to trigger a variety of issues when something
>> fragile exists in the mmu layer as modified by the I-Pipe.
>
> Well we can try after I come back from vacation in a couple of weeks.
>

Ok. I will try to reproduce on my side as well.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
  2011-12-23 17:33 [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc Lennart Sorensen
  2011-12-23 18:17 ` Philippe Gerum
@ 2012-01-04 13:34 ` Philippe Gerum
  2018-03-21 15:40   ` [Xenomai] " Frank Benkert
  1 sibling, 1 reply; 12+ messages in thread
From: Philippe Gerum @ 2012-01-04 13:34 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: xenomai

On 12/23/2011 06:33 PM, Lennart Sorensen wrote:
> After spending quite a while trying to explain how things like /bin/echo
> could possibly segfault, I finally discovered that the new feature in
> xenomai 2.6.0 (new when moving from 2.4.10 that is) of having preemptible
> context switches is what is corrupting the state of random linux processes
> once in a while.
> 
> After turning the option off, I haven't seen a single crash just like 2.4.10.
> 
> So something subtle is wrong with this option.
> 
> It appears to be most likely to occour (possibly only likely) when
> xenomai is handling interrupts.
> 
> It seems that getting an interrupt in the middle of a context switch at
> the wrong time corrupts the process that is being switched to or from
> (no idea which it is).
> 
> Unless someone can think of a way to track down and fix this I would
> certainly suggest making the option off by default instead of on.
> 
> With CONFIG_XENO_HW_UNLOCKED_SWITCH=n I don't have any problems anymore.
> 

Does the patch below help?
http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=f38d0b2a820104411c5a33636f6dab634a9bffc1


-- 
Philippe.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Xenomai] [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
  2012-01-04 13:34 ` Philippe Gerum
@ 2018-03-21 15:40   ` Frank Benkert
  2018-03-21 16:40     ` Philippe Gerum
  0 siblings, 1 reply; 12+ messages in thread
From: Frank Benkert @ 2018-03-21 15:40 UTC (permalink / raw)
  To: xenomai

Sorry for pumping this old topic, but it needs some clarification:

For all of you who find this thread via search engines because you are 
looking for sporadic crashes on Xenomai PowerPC in relation to task 
switches:
This patch does not fix the problem - at least not in our case. After 
several years, the problem suddenly appeared with us, as we are now 
increasingly using the Ethernet interface on our old product. Maybe the 
new interrupt load triggers this old bug.

There is a patch in Xenomai 3 which removes the buggy feature because of 
problems with the MMU.
The only way to fix these sporadic crashes is to disable the switch 
(CONFIG_XENO_HW_UNLOCKED_SWITCH=n).

See also
http://git.xenomai.org/?p=ipipe.git;a=commit;h=614aa59453dacf7693fbb18229c27676c2803dbb
http://git.xenomai.org/?p=ipipe.git;a=commit;h=04ea520ab96a16ec65529a2efed92c9a4a8bda34



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xenomai] [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
  2018-03-21 15:40   ` [Xenomai] " Frank Benkert
@ 2018-03-21 16:40     ` Philippe Gerum
  2018-03-22  7:22       ` Frank Benkert
  0 siblings, 1 reply; 12+ messages in thread
From: Philippe Gerum @ 2018-03-21 16:40 UTC (permalink / raw)
  To: Frank Benkert, xenomai

On 03/21/2018 04:40 PM, Frank Benkert wrote:
> Sorry for pumping this old topic, but it needs some clarification:
> 
> For all of you who find this thread via search engines because you are
> looking for sporadic crashes on Xenomai PowerPC in relation to task
> switches:
> This patch does not fix the problem - at least not in our case. After
> several years, the problem suddenly appeared with us, as we are now
> increasingly using the Ethernet interface on our old product. Maybe the
> new interrupt load triggers this old bug.
> 
> There is a patch in Xenomai 3 which removes the buggy feature because of
> problems with the MMU.

No, the commit you are referring to reads as this:

commit 323824258692a6d175881d18a644b276858b353d
Author: Philippe Gerum <rpm@xenomai.org>
Date:   Sat Nov 14 16:41:13 2015 +0100

    cobalt/powerpc: drop support for unlocked context switch

    This feature never actually brought any measurable gain on powerpc
    platforms, compared to the complexity of its implementation in the
    pipeline. It was primarily aimed at reducing latency for interrupt
    handlers when costly cache and TLB flushes are required to switch
    context, at the expense of increasing the scheduling latency.  It
    turned out to be counter-productive on common powerpc platforms, with
    efficient MMUs.

    This feature has been default off for a while now, and 4.1+ pipelines
    won't provide support for it anymore. Time to drop support from
    Xenomai too.

This was a decision based on the unfavorable performance vs complexity
ratio, not because of any pending bug that could not be fixed.

> The only way to fix these sporadic crashes is to disable the switch
> (CONFIG_XENO_HW_UNLOCKED_SWITCH=n).
> 

Possibly not, because your reasoning assumes that only the IRQ pipeline
might be involved in dealing with unlocked switching, which is wrong.
The Xenomai core is involved too, as hinted by the commit log above. If
you are actually running the stock 2.6.0 release, another attempt at
addressing the random crash issue would be to merge this commit:

commit ffc58d175a4e6f335c0e42946fa45ca984a93ce4
Author: Philippe Gerum <rpm@xenomai.org>
Date:   Wed Jan 4 14:14:11 2012 +0100

    hal/powerpc: plug race in thread context switch

    Since rthal_thread_switch() is entered with hw IRQs enabled when
    CONFIG_XENO_HW_UNLOCKED_SWITCH is in effect, we ought to mask them
    around the register swap. This is in essence an overdue fix for the
    issue spotted and solved quite some time ago by Jesper Christensen,
    see: https://mail.gna.org/public/xenomai-core/2011-04/msg00095.html.

    Configurations with CONFIG_XENO_HW_UNLOCKED_SWITCH disabled are immune
    to this issue.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xenomai] [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc.
  2018-03-21 16:40     ` Philippe Gerum
@ 2018-03-22  7:22       ` Frank Benkert
  0 siblings, 0 replies; 12+ messages in thread
From: Frank Benkert @ 2018-03-22  7:22 UTC (permalink / raw)
  To: xenomai

Hi Philippe,

thanks for responding that fast. I've only just realized that I posted 
on the wrong list. Sorry for that.

We currently running Xenomai 2.6.5 and the Problems still exists:
Random Crashes of Xenomai- and Non-Xenomai Processes with SIGSEG and 
SIGILL at various positions without any recognisable correlations.

This means, that the Patch
 > commit ffc58d175a4e6f335c0e42946fa45ca984a93ce4
 > Author: Philippe Gerum <rpm@xenomai.org>
 > Date:   Wed Jan 4 14:14:11 2012 +0100
 >
 >      hal/powerpc: plug race in thread context switch
does not fix the problem in our case - sorry.

My recommendation at least on an old MPC5200 processor is to disable the 
unlocked-switch functionality to prevent this crashes. In the meantime, 
our test systems run ten times as long without any abnormalities.

This is what my original post should mention for all guys stumbling over 
this thread while digging for this random crashes.

Now I understand, that removing this feature in Xenomai-3 was not driven 
by bug reports. My mistake.

Thanks!


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-03-22  7:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-23 17:33 [Xenomai-core] CONFIG_XENO_HW_UNLOCKED_SWITCH=y causes random process corruption in xenomai 2.6.0 on powerpc Lennart Sorensen
2011-12-23 18:17 ` Philippe Gerum
2011-12-23 18:32   ` Lennart Sorensen
2011-12-23 20:08     ` Philippe Gerum
2011-12-23 20:25       ` Lennart Sorensen
2011-12-23 21:48         ` Philippe Gerum
2011-12-23 21:55           ` Lennart Sorensen
2011-12-23 21:58             ` Philippe Gerum
2012-01-04 13:34 ` Philippe Gerum
2018-03-21 15:40   ` [Xenomai] " Frank Benkert
2018-03-21 16:40     ` Philippe Gerum
2018-03-22  7:22       ` Frank Benkert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.