All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-core] [BUG] trunk: screwed Linux irq state
@ 2007-02-11 22:13 Jan Kiszka
  2007-02-11 22:26 ` Gilles Chanteperdrix
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Jan Kiszka @ 2007-02-11 22:13 UTC (permalink / raw)
  To: xenomai-core


[-- Attachment #1.1: Type: text/plain, Size: 2271 bytes --]

Hi,

while testing 2.6.20 with RTnet, I got this kernel BUG during the slave
startup procedure:

<4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us)
<4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
<4>[  142.291585]  [<c0103a8f>] show_trace_log_lvl+0x1f/0x40
<4>[  142.291767]  [<c0104237>] show_trace+0x17/0x20
<4>[  142.291896]  [<c010432b>] dump_stack+0x1b/0x20
<4>[  142.292026]  [<c0111e94>] copy_process+0x914/0x13d0
<4>[  142.292190]  [<c0112b80>] do_fork+0x70/0x1b0
<4>[  142.292323]  [<c0101078>] sys_clone+0x38/0x40
<4>[  142.292620]  [<c010320f>] syscall_call+0x7/0xb
<4>[  142.292747]  =======================
<3>[  142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034
<4>[  142.293052] in_atomic():0, irqs_disabled():1
<4>[  142.293152] no locks held by init/1.
<4>[  142.293244] irq event stamp: 500992
<4>[  142.293335] hardirqs last  enabled at (500991): [<c010b77c>] __ipipe_handle_exception+0xdc/0x188
<4>[  142.293737] hardirqs last disabled at (500992): [<c010306e>] ret_from_exception+0xe/0x20
<4>[  142.293967] softirqs last  enabled at (500868): [<c01189ab>] __do_softirq+0xab/0xc0
<4>[  142.294189] softirqs last disabled at (500861): [<c0118a55>] do_softirq+0x95/0xa0
<4>[  142.294562]  [<c0103a8f>] show_trace_log_lvl+0x1f/0x40
<4>[  142.294743]  [<c0104237>] show_trace+0x17/0x20
<4>[  142.294897]  [<c010432b>] dump_stack+0x1b/0x20
<4>[  142.295050]  [<c010f35d>] __might_sleep+0xcd/0x100
<4>[  142.295220]  [<c017d771>] kmem_cache_alloc+0xa1/0xc0
<4>[  142.295527]  [<c0110f89>] dup_fd+0x29/0x2d0
<4>[  142.295689]  [<c0111279>] copy_files+0x49/0x70
<4>[  142.295851]  [<c0111c2f>] copy_process+0x6af/0x13d0
<4>[  142.296019]  [<c0112b80>] do_fork+0x70/0x1b0
<4>[  142.296178]  [<c0101078>] sys_clone+0x38/0x40
<4>[  142.296326]  [<c010320f>] syscall_call+0x7/0xb
<4>[  142.296641]  =======================

I'm seeing this with Xenomai trunk + the xnpod_suspend_thread patch. I
attached my .config. The interesting thing is that this doesn't show up
with v2.3.x head (kernel & config identical). Switching back to 2.6.19
doesn't change the picture.

Anyone any idea? No-COW is now both in trunk and 2.3.x, right?

Jan


[-- Attachment #1.2: config.bz2 --]
[-- Type: application/x-bzip, Size: 7936 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-11 22:13 [Xenomai-core] [BUG] trunk: screwed Linux irq state Jan Kiszka
@ 2007-02-11 22:26 ` Gilles Chanteperdrix
  2007-02-11 22:31 ` Gilles Chanteperdrix
  2007-02-11 22:42 ` Philippe Gerum
  2 siblings, 0 replies; 17+ messages in thread
From: Gilles Chanteperdrix @ 2007-02-11 22:26 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
 > Hi,
 > 
 > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave
 > startup procedure:
 > 
 > <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us)
 > <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
 > <4>[  142.291585]  [<c0103a8f>] show_trace_log_lvl+0x1f/0x40
 > <4>[  142.291767]  [<c0104237>] show_trace+0x17/0x20
 > <4>[  142.291896]  [<c010432b>] dump_stack+0x1b/0x20
 > <4>[  142.292026]  [<c0111e94>] copy_process+0x914/0x13d0
 > <4>[  142.292190]  [<c0112b80>] do_fork+0x70/0x1b0
 > <4>[  142.292323]  [<c0101078>] sys_clone+0x38/0x40
 > <4>[  142.292620]  [<c010320f>] syscall_call+0x7/0xb
 > <4>[  142.292747]  =======================
 > <3>[  142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034
 > <4>[  142.293052] in_atomic():0, irqs_disabled():1
 > <4>[  142.293152] no locks held by init/1.
 > <4>[  142.293244] irq event stamp: 500992
 > <4>[  142.293335] hardirqs last  enabled at (500991): [<c010b77c>] __ipipe_handle_exception+0xdc/0x188
 > <4>[  142.293737] hardirqs last disabled at (500992): [<c010306e>] ret_from_exception+0xe/0x20
 > <4>[  142.293967] softirqs last  enabled at (500868): [<c01189ab>] __do_softirq+0xab/0xc0
 > <4>[  142.294189] softirqs last disabled at (500861): [<c0118a55>] do_softirq+0x95/0xa0
 > <4>[  142.294562]  [<c0103a8f>] show_trace_log_lvl+0x1f/0x40
 > <4>[  142.294743]  [<c0104237>] show_trace+0x17/0x20
 > <4>[  142.294897]  [<c010432b>] dump_stack+0x1b/0x20
 > <4>[  142.295050]  [<c010f35d>] __might_sleep+0xcd/0x100
 > <4>[  142.295220]  [<c017d771>] kmem_cache_alloc+0xa1/0xc0
 > <4>[  142.295527]  [<c0110f89>] dup_fd+0x29/0x2d0
 > <4>[  142.295689]  [<c0111279>] copy_files+0x49/0x70
 > <4>[  142.295851]  [<c0111c2f>] copy_process+0x6af/0x13d0
 > <4>[  142.296019]  [<c0112b80>] do_fork+0x70/0x1b0
 > <4>[  142.296178]  [<c0101078>] sys_clone+0x38/0x40
 > <4>[  142.296326]  [<c010320f>] syscall_call+0x7/0xb
 > <4>[  142.296641]  =======================
 > 
 > I'm seeing this with Xenomai trunk + the xnpod_suspend_thread patch. I
 > attached my .config. The interesting thing is that this doesn't show up
 > with v2.3.x head (kernel & config identical). Switching back to 2.6.19
 > doesn't change the picture.
 > 
 > Anyone any idea? No-COW is now both in trunk and 2.3.x, right?

In order to see if it is the effect of the no-cow patch, comment out the
I-pipe portion in copy_one_pte, in mm/memory.c. This code should have an
effect only if you are forking in a real-time application, though.

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-11 22:13 [Xenomai-core] [BUG] trunk: screwed Linux irq state Jan Kiszka
  2007-02-11 22:26 ` Gilles Chanteperdrix
@ 2007-02-11 22:31 ` Gilles Chanteperdrix
  2007-02-11 22:42 ` Philippe Gerum
  2 siblings, 0 replies; 17+ messages in thread
From: Gilles Chanteperdrix @ 2007-02-11 22:31 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
 > Hi,
 > 
 > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave
 > startup procedure:
 > 
 > <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us)
 > <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
 > <4>[  142.291585]  [<c0103a8f>] show_trace_log_lvl+0x1f/0x40
 > <4>[  142.291767]  [<c0104237>] show_trace+0x17/0x20
 > <4>[  142.291896]  [<c010432b>] dump_stack+0x1b/0x20
 > <4>[  142.292026]  [<c0111e94>] copy_process+0x914/0x13d0
 > <4>[  142.292190]  [<c0112b80>] do_fork+0x70/0x1b0
 > <4>[  142.292323]  [<c0101078>] sys_clone+0x38/0x40
 > <4>[  142.292620]  [<c010320f>] syscall_call+0x7/0xb
 > <4>[  142.292747]  =======================
 > <3>[  142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034
 > <4>[  142.293052] in_atomic():0, irqs_disabled():1
 > <4>[  142.293152] no locks held by init/1.
 > <4>[  142.293244] irq event stamp: 500992
 > <4>[  142.293335] hardirqs last  enabled at (500991): [<c010b77c>] __ipipe_handle_exception+0xdc/0x188
 > <4>[  142.293737] hardirqs last disabled at (500992): [<c010306e>] ret_from_exception+0xe/0x20
 > <4>[  142.293967] softirqs last  enabled at (500868): [<c01189ab>] __do_softirq+0xab/0xc0
 > <4>[  142.294189] softirqs last disabled at (500861): [<c0118a55>] do_softirq+0x95/0xa0
 > <4>[  142.294562]  [<c0103a8f>] show_trace_log_lvl+0x1f/0x40
 > <4>[  142.294743]  [<c0104237>] show_trace+0x17/0x20
 > <4>[  142.294897]  [<c010432b>] dump_stack+0x1b/0x20
 > <4>[  142.295050]  [<c010f35d>] __might_sleep+0xcd/0x100
 > <4>[  142.295220]  [<c017d771>] kmem_cache_alloc+0xa1/0xc0
 > <4>[  142.295527]  [<c0110f89>] dup_fd+0x29/0x2d0
 > <4>[  142.295689]  [<c0111279>] copy_files+0x49/0x70
 > <4>[  142.295851]  [<c0111c2f>] copy_process+0x6af/0x13d0
 > <4>[  142.296019]  [<c0112b80>] do_fork+0x70/0x1b0
 > <4>[  142.296178]  [<c0101078>] sys_clone+0x38/0x40
 > <4>[  142.296326]  [<c010320f>] syscall_call+0x7/0xb
 > <4>[  142.296641]  =======================
 > 
 > I'm seeing this with Xenomai trunk + the xnpod_suspend_thread patch. I
 > attached my .config. The interesting thing is that this doesn't show up
 > with v2.3.x head (kernel & config identical). Switching back to 2.6.19
 > doesn't change the picture.
 > 
 > Anyone any idea? No-COW is now both in trunk and 2.3.x, right?

alloc_page_vma in copy_one_pte uses the GFP_HIGHUSER flag, whereas the
bug suggests we are in an atomic section, so maybe we should use 
GFP_ATOMIC | __GFP_HIGHMEM

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-11 22:13 [Xenomai-core] [BUG] trunk: screwed Linux irq state Jan Kiszka
  2007-02-11 22:26 ` Gilles Chanteperdrix
  2007-02-11 22:31 ` Gilles Chanteperdrix
@ 2007-02-11 22:42 ` Philippe Gerum
  2007-02-11 23:07   ` Gilles Chanteperdrix
  2 siblings, 1 reply; 17+ messages in thread
From: Philippe Gerum @ 2007-02-11 22:42 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote:
> Hi,
> 
> while testing 2.6.20 with RTnet, I got this kernel BUG during the slave
> startup procedure:
> 
> <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us)
> <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
> <4>[  142.291585]  [<c0103a8f>] show_trace_log_lvl+0x1f/0x40
> <4>[  142.291767]  [<c0104237>] show_trace+0x17/0x20
> <4>[  142.291896]  [<c010432b>] dump_stack+0x1b/0x20
> <4>[  142.292026]  [<c0111e94>] copy_process+0x914/0x13d0
> <4>[  142.292190]  [<c0112b80>] do_fork+0x70/0x1b0
> <4>[  142.292323]  [<c0101078>] sys_clone+0x38/0x40
> <4>[  142.292620]  [<c010320f>] syscall_call+0x7/0xb
> <4>[  142.292747]  =======================
> <3>[  142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034
> <4>[  142.293052] in_atomic():0, irqs_disabled():1
                                                 ^^^^

Typical of something going wrong in entry.S.

> <4>[  142.293152] no locks held by init/1.
> <4>[  142.293244] irq event stamp: 500992
> <4>[  142.293335] hardirqs last  enabled at (500991): [<c010b77c>] __ipipe_handle_exception+0xdc/0x188
> <4>[  142.293737] hardirqs last disabled at (500992): [<c010306e>] ret_from_exception+0xe/0x20
> <4>[  142.293967] softirqs last  enabled at (500868): [<c01189ab>] __do_softirq+0xab/0xc0
> <4>[  142.294189] softirqs last disabled at (500861): [<c0118a55>] do_softirq+0x95/0xa0
> <4>[  142.294562]  [<c0103a8f>] show_trace_log_lvl+0x1f/0x40
> <4>[  142.294743]  [<c0104237>] show_trace+0x17/0x20
> <4>[  142.294897]  [<c010432b>] dump_stack+0x1b/0x20
> <4>[  142.295050]  [<c010f35d>] __might_sleep+0xcd/0x100
> <4>[  142.295220]  [<c017d771>] kmem_cache_alloc+0xa1/0xc0
> <4>[  142.295527]  [<c0110f89>] dup_fd+0x29/0x2d0
> <4>[  142.295689]  [<c0111279>] copy_files+0x49/0x70
> <4>[  142.295851]  [<c0111c2f>] copy_process+0x6af/0x13d0
> <4>[  142.296019]  [<c0112b80>] do_fork+0x70/0x1b0
> <4>[  142.296178]  [<c0101078>] sys_clone+0x38/0x40
> <4>[  142.296326]  [<c010320f>] syscall_call+0x7/0xb
> <4>[  142.296641]  =======================
> 
> I'm seeing this with Xenomai trunk + the xnpod_suspend_thread patch. I
> attached my .config. The interesting thing is that this doesn't show up
> with v2.3.x head (kernel & config identical). Switching back to 2.6.19
> doesn't change the picture.
> 

Could you disable the tracer and remove the xnpod_suspend_thread()
patch, then downgrade to 1.6-04 to remove the COW support? TIA,

> Anyone any idea? No-COW is now both in trunk and 2.3.x, right?
> 

Only with the I-pipe patches from the 1.7 series on x86.

> Jan
> 
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core
-- 
Philippe.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-11 22:42 ` Philippe Gerum
@ 2007-02-11 23:07   ` Gilles Chanteperdrix
  2007-02-11 23:49     ` Philippe Gerum
  0 siblings, 1 reply; 17+ messages in thread
From: Gilles Chanteperdrix @ 2007-02-11 23:07 UTC (permalink / raw)
  To: rpm; +Cc: Jan Kiszka, xenomai-core

Philippe Gerum wrote:
 > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote:
 > > Hi,
 > > 
 > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave
 > > startup procedure:
 > > 
 > > <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us)
 > > <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
 > > <4>[  142.291585]  [<c0103a8f>] show_trace_log_lvl+0x1f/0x40
 > > <4>[  142.291767]  [<c0104237>] show_trace+0x17/0x20
 > > <4>[  142.291896]  [<c010432b>] dump_stack+0x1b/0x20
 > > <4>[  142.292026]  [<c0111e94>] copy_process+0x914/0x13d0
 > > <4>[  142.292190]  [<c0112b80>] do_fork+0x70/0x1b0
 > > <4>[  142.292323]  [<c0101078>] sys_clone+0x38/0x40
 > > <4>[  142.292620]  [<c010320f>] syscall_call+0x7/0xb
 > > <4>[  142.292747]  =======================
 > > <3>[  142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034
 > > <4>[  142.293052] in_atomic():0, irqs_disabled():1
 >                                                  ^^^^
 > 
 > Typical of something going wrong in entry.S.

You mean, interrupts are not really disabled when forking ? :-)

So, I am afraid the new fpu_counter optimization is buggy: if a task
forks with fpu_counter greater than 5 and is preempted right after
prepare_to_copy in dup_task_struct, when the system switches back to
this task, the task FPU context will be restored and TS_USEDFPU set in
the task flags, thereby voiding the effect of prepare_to_copy.

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-11 23:07   ` Gilles Chanteperdrix
@ 2007-02-11 23:49     ` Philippe Gerum
  2007-02-12  0:20       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 17+ messages in thread
From: Philippe Gerum @ 2007-02-11 23:49 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai-core

On Mon, 2007-02-12 at 00:07 +0100, Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>  > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote:
>  > > Hi,
>  > > 
>  > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave
>  > > startup procedure:
>  > > 
>  > > <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us)
>  > > <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
>  > > <4>[  142.291585]  [<c0103a8f>] show_trace_log_lvl+0x1f/0x40
>  > > <4>[  142.291767]  [<c0104237>] show_trace+0x17/0x20
>  > > <4>[  142.291896]  [<c010432b>] dump_stack+0x1b/0x20
>  > > <4>[  142.292026]  [<c0111e94>] copy_process+0x914/0x13d0
>  > > <4>[  142.292190]  [<c0112b80>] do_fork+0x70/0x1b0
>  > > <4>[  142.292323]  [<c0101078>] sys_clone+0x38/0x40
>  > > <4>[  142.292620]  [<c010320f>] syscall_call+0x7/0xb
>  > > <4>[  142.292747]  =======================
>  > > <3>[  142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034
>  > > <4>[  142.293052] in_atomic():0, irqs_disabled():1
>  >                                                  ^^^^
>  > 
>  > Typical of something going wrong in entry.S.
> 
> You mean, interrupts are not really disabled when forking ? :-)
> 

Eh, mmmh, no. Hopefully.

> So, I am afraid the new fpu_counter optimization is buggy: if a task
> forks with fpu_counter greater than 5 and is preempted right after
> prepare_to_copy in dup_task_struct, when the system switches back to
> this task, the task FPU context will be restored and TS_USEDFPU set in
> the task flags, thereby voiding the effect of prepare_to_copy.
> 

You mean that the parent FPU context would leak into the child's one?
Well, maybe the LKML people would like to know about this. As a
sidenote, I don't see anything bad with your latest counter-measure
disabling this optimization in Xenomai's context switch code, even in
the bugous case above. Right? 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-11 23:49     ` Philippe Gerum
@ 2007-02-12  0:20       ` Gilles Chanteperdrix
  2007-02-12  0:28         ` Jan Kiszka
  0 siblings, 1 reply; 17+ messages in thread
From: Gilles Chanteperdrix @ 2007-02-12  0:20 UTC (permalink / raw)
  To: rpm; +Cc: Jan Kiszka, xenomai-core

Philippe Gerum wrote:
 > On Mon, 2007-02-12 at 00:07 +0100, Gilles Chanteperdrix wrote:
 > > Philippe Gerum wrote:
 > >  > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote:
 > >  > > Hi,
 > >  > > 
 > >  > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave
 > >  > > startup procedure:
 > >  > > 
 > >  > > <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us)
 > >  > > <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
 > >  > > <4>[  142.291585]  [<c0103a8f>] show_trace_log_lvl+0x1f/0x40
 > >  > > <4>[  142.291767]  [<c0104237>] show_trace+0x17/0x20
 > >  > > <4>[  142.291896]  [<c010432b>] dump_stack+0x1b/0x20
 > >  > > <4>[  142.292026]  [<c0111e94>] copy_process+0x914/0x13d0
 > >  > > <4>[  142.292190]  [<c0112b80>] do_fork+0x70/0x1b0
 > >  > > <4>[  142.292323]  [<c0101078>] sys_clone+0x38/0x40
 > >  > > <4>[  142.292620]  [<c010320f>] syscall_call+0x7/0xb
 > >  > > <4>[  142.292747]  =======================
 > >  > > <3>[  142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034
 > >  > > <4>[  142.293052] in_atomic():0, irqs_disabled():1
 > >  >                                                  ^^^^
 > >  > 
 > >  > Typical of something going wrong in entry.S.
 > > 
 > > You mean, interrupts are not really disabled when forking ? :-)
 > > 
 > 
 > Eh, mmmh, no. Hopefully.
 > 
 > > So, I am afraid the new fpu_counter optimization is buggy: if a task
 > > forks with fpu_counter greater than 5 and is preempted right after
 > > prepare_to_copy in dup_task_struct, when the system switches back to
 > > this task, the task FPU context will be restored and TS_USEDFPU set in
 > > the task flags, thereby voiding the effect of prepare_to_copy.
 > > 
 > 
 > You mean that the parent FPU context would leak into the child's one?

Yes, something like that. The result is random segfaults, I do not
remember exactly why.

 > Well, maybe the LKML people would like to know about this. As a
 > sidenote, I don't see anything bad with your latest counter-measure
 > disabling this optimization in Xenomai's context switch code, even in
 > the bugous case above. Right? 

Right, if there are random segfaults, they will not be xenomai's fault.

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-12  0:20       ` Gilles Chanteperdrix
@ 2007-02-12  0:28         ` Jan Kiszka
  2007-02-12  1:10           ` Jan Kiszka
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kiszka @ 2007-02-12  0:28 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 2706 bytes --]

Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>  > On Mon, 2007-02-12 at 00:07 +0100, Gilles Chanteperdrix wrote:
>  > > Philippe Gerum wrote:
>  > >  > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote:
>  > >  > > Hi,
>  > >  > > 
>  > >  > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave
>  > >  > > startup procedure:
>  > >  > > 
>  > >  > > <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us)
>  > >  > > <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
>  > >  > > <4>[  142.291585]  [<c0103a8f>] show_trace_log_lvl+0x1f/0x40
>  > >  > > <4>[  142.291767]  [<c0104237>] show_trace+0x17/0x20
>  > >  > > <4>[  142.291896]  [<c010432b>] dump_stack+0x1b/0x20
>  > >  > > <4>[  142.292026]  [<c0111e94>] copy_process+0x914/0x13d0
>  > >  > > <4>[  142.292190]  [<c0112b80>] do_fork+0x70/0x1b0
>  > >  > > <4>[  142.292323]  [<c0101078>] sys_clone+0x38/0x40
>  > >  > > <4>[  142.292620]  [<c010320f>] syscall_call+0x7/0xb
>  > >  > > <4>[  142.292747]  =======================
>  > >  > > <3>[  142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034
>  > >  > > <4>[  142.293052] in_atomic():0, irqs_disabled():1
>  > >  >                                                  ^^^^
>  > >  > 
>  > >  > Typical of something going wrong in entry.S.
>  > > 
>  > > You mean, interrupts are not really disabled when forking ? :-)
>  > > 
>  > 
>  > Eh, mmmh, no. Hopefully.
>  > 
>  > > So, I am afraid the new fpu_counter optimization is buggy: if a task
>  > > forks with fpu_counter greater than 5 and is preempted right after
>  > > prepare_to_copy in dup_task_struct, when the system switches back to
>  > > this task, the task FPU context will be restored and TS_USEDFPU set in
>  > > the task flags, thereby voiding the effect of prepare_to_copy.
>  > > 
>  > 
>  > You mean that the parent FPU context would leak into the child's one?
> 
> Yes, something like that. The result is random segfaults, I do not
> remember exactly why.
> 
>  > Well, maybe the LKML people would like to know about this. As a
>  > sidenote, I don't see anything bad with your latest counter-measure
>  > disabling this optimization in Xenomai's context switch code, even in
>  > the bugous case above. Right? 
> 
> Right, if there are random segfaults, they will not be xenomai's fault.
> 

I'm currently sorting the symptoms again, or better I'm looking where
they went to. 2.6.20 just decided to work normally again, 2.6.19 needs a
re-check.

It appears now that the tracer played an important role, but I'm not
100% sure yet. I'll keep you posted.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-12  0:28         ` Jan Kiszka
@ 2007-02-12  1:10           ` Jan Kiszka
  2007-02-12 11:49             ` Jan Kiszka
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kiszka @ 2007-02-12  1:10 UTC (permalink / raw)
  To: Philippe Gerum, Gilles Chanteperdrix; +Cc: xenomai-core


[-- Attachment #1.1: Type: text/plain, Size: 3367 bytes --]

Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Philippe Gerum wrote:
>>  > On Mon, 2007-02-12 at 00:07 +0100, Gilles Chanteperdrix wrote:
>>  > > Philippe Gerum wrote:
>>  > >  > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote:
>>  > >  > > Hi,
>>  > >  > > 
>>  > >  > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave
>>  > >  > > startup procedure:
>>  > >  > > 
>>  > >  > > <4>[  137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us)
>>  > >  > > <4>[  142.291455] BUG: at kernel/fork.c:993 copy_process()
>>  > >  > > <4>[  142.291585]  [<c0103a8f>] show_trace_log_lvl+0x1f/0x40
>>  > >  > > <4>[  142.291767]  [<c0104237>] show_trace+0x17/0x20
>>  > >  > > <4>[  142.291896]  [<c010432b>] dump_stack+0x1b/0x20
>>  > >  > > <4>[  142.292026]  [<c0111e94>] copy_process+0x914/0x13d0
>>  > >  > > <4>[  142.292190]  [<c0112b80>] do_fork+0x70/0x1b0
>>  > >  > > <4>[  142.292323]  [<c0101078>] sys_clone+0x38/0x40
>>  > >  > > <4>[  142.292620]  [<c010320f>] syscall_call+0x7/0xb
>>  > >  > > <4>[  142.292747]  =======================
>>  > >  > > <3>[  142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034
>>  > >  > > <4>[  142.293052] in_atomic():0, irqs_disabled():1
>>  > >  >                                                  ^^^^
>>  > >  > 
>>  > >  > Typical of something going wrong in entry.S.
>>  > > 
>>  > > You mean, interrupts are not really disabled when forking ? :-)
>>  > > 
>>  > 
>>  > Eh, mmmh, no. Hopefully.
>>  > 
>>  > > So, I am afraid the new fpu_counter optimization is buggy: if a task
>>  > > forks with fpu_counter greater than 5 and is preempted right after
>>  > > prepare_to_copy in dup_task_struct, when the system switches back to
>>  > > this task, the task FPU context will be restored and TS_USEDFPU set in
>>  > > the task flags, thereby voiding the effect of prepare_to_copy.
>>  > > 
>>  > 
>>  > You mean that the parent FPU context would leak into the child's one?
>>
>> Yes, something like that. The result is random segfaults, I do not
>> remember exactly why.
>>
>>  > Well, maybe the LKML people would like to know about this. As a
>>  > sidenote, I don't see anything bad with your latest counter-measure
>>  > disabling this optimization in Xenomai's context switch code, even in
>>  > the bugous case above. Right? 
>>
>> Right, if there are random segfaults, they will not be xenomai's fault.
>>
> 
> I'm currently sorting the symptoms again, or better I'm looking where
> they went to. 2.6.20 just decided to work normally again, 2.6.19 needs a
> re-check.
> 
> It appears now that the tracer played an important role, but I'm not
> 100% sure yet. I'll keep you posted.

2.6.19 didn't magically start to work as well. Instead I have a back
trace now, see attachment.

I included a full set of 16k points, but the thrilling things are around
-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
stack manager, prio 98). But when returning to Linux again, its IRQs
remain masked now. The reason must be that weird exception at -62. Don't
know where it comes from and why is there no report about THAT issue in
the kernel logs.

OK, enough for today.

Jan

[-- Attachment #1.2: back-trace.bz2 --]
[-- Type: application/x-bzip, Size: 70700 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-12  1:10           ` Jan Kiszka
@ 2007-02-12 11:49             ` Jan Kiszka
  2007-02-12 13:16               ` Gilles Chanteperdrix
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kiszka @ 2007-02-12 11:49 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 3640 bytes --]

Jan Kiszka wrote:
> 2.6.19 didn't magically start to work as well. Instead I have a back
> trace now, see attachment.
> 
> I included a full set of 16k points, but the thrilling things are around
> -73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
> (RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
> stack manager, prio 98). But when returning to Linux again, its IRQs
> remain masked now. The reason must be that weird exception at -62. Don't
> know where it comes from and why is there no report about THAT issue in
> the kernel logs.

The cause of this page fault will get tracked down later today, but the
way it is handled already causes some doubts to me. To make discussion
easier, here is the relevant excerpt from the trace:

> :    +func                 -73+   1.426  link_path_walk+0x14 (__link_path_walk+0xca0)
> :|   +func                 -72    0.605  __ipipe_handle_irq+0x14 (common_interrupt+0x18)
> :|   +func                 -71    0.472  __ipipe_ack_irq+0x8 (__ipipe_handle_irq+0xaf)
> :|   +func                 -70    0.224  __ipipe_ack_level_irq+0x12 (__ipipe_ack_irq+0x19)
> :|   +func                 -70+   4.424  mask_and_ack_8259A+0x14 (__ipipe_ack_level_irq+0x22)
> :|   +func                 -66    0.475  __ipipe_dispatch_wired+0x14 (__ipipe_handle_irq+0x62)
> :|  # func                 -65    0.974  xnintr_irq_handler+0xe (__ipipe_dispatch_wired+0x95)
> :|  # func                 -64+   1.892  rtl8139_interrupt+0x11 [rt_8139too] (xnintr_irq_handler+0x3b)
> :|  # func                 -62    0.382  __ipipe_handle_exception+0xe (error_code+0x3e)
> :|  # func                 -62    0.222  __ipipe_test_root+0x8 (__ipipe_handle_exception+0x1a)
> :|  # func                 -62    0.377  __ipipe_stall_root+0x8 (__ipipe_handle_exception+0x15b)
> :|  #*func                 -62    0.173  trace_hardirqs_off+0xc (__ipipe_handle_exception+0x165)
> :|  #*func                 -61    0.211  __ipipe_test_root+0x8 (trace_hardirqs_off+0x2d)
> :|  #*func                 -61+   1.965  do_page_fault+0xe (__ipipe_handle_exception+0x6d)
> :   #*func                 -59    0.180  trace_hardirqs_on+0x11 (__ipipe_handle_exception+0xd9)
> :   #*func                 -59    0.163  __ipipe_test_root+0x8 (trace_hardirqs_on+0x5e)
> :   #*func                 -59    0.396  mark_held_locks+0xe (trace_hardirqs_on+0x8b)
> :   #*func                 -58    0.212  mark_held_locks+0xe (trace_hardirqs_on+0xc9)
> :   #*func                 -58    0.461  __ipipe_restore_root+0x8 (__ipipe_handle_exception+0xe1)
> :   #*func                 -58    0.253  __ipipe_unstall_root+0x8 (__ipipe_restore_root+0x18)
> :   # func                 -57    0.224  __ipipe_stall_root+0x8 (ret_from_exception+0x5)
> :   #*func                 -57    0.366  trace_hardirqs_off+0xc (ret_from_exception+0xe)
> :   #*func                 -57    0.327  __ipipe_test_root+0x8 (trace_hardirqs_off+0x2d)
> :|  #*func                 -57+   2.089  __ipipe_unstall_iret_root+0x8 (restore_nocheck_notrace+0x0)
> :|  #*func                 -54+   1.444  alloc_rtskb+0xa [rtnet] (rtl8139_interrupt+0x182 [rt_8139too])
> :|  #*func                 -53+   1.172  rt_eth_type_trans+0xe [rtnet] (rtl8139_interrupt+0x1d6 [rt_8139too]) 

The fault gets forwarded to Linux because ipipe_trap_notify doesn't
choke: we are neither running over a task with PF_EVNOTIFY set nor over
a kernel thread yet (IPIPE_NOSTACK_FLAG). Still, we are already in
primary domain, so I wonder if this forwarding is intentional. At least
it seems to break some things later on...

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-12 11:49             ` Jan Kiszka
@ 2007-02-12 13:16               ` Gilles Chanteperdrix
  2007-02-12 13:46                 ` Philippe Gerum
  0 siblings, 1 reply; 17+ messages in thread
From: Gilles Chanteperdrix @ 2007-02-12 13:16 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
> Jan Kiszka wrote:
> 
>>2.6.19 didn't magically start to work as well. Instead I have a back
>>trace now, see attachment.
>>
>>I included a full set of 16k points, but the thrilling things are around
>>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
>>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
>>stack manager, prio 98). But when returning to Linux again, its IRQs
>>remain masked now. The reason must be that weird exception at -62. Don't
>>know where it comes from and why is there no report about THAT issue in
>>the kernel logs.
> 
> 
> The cause of this page fault will get tracked down later today, but the
> way it is handled already causes some doubts to me. To make discussion
> easier, here is the relevant excerpt from the trace:

Maybe this fault is due to the No-cow patch ? Before the no-cow patch,
vmalloced areas were added to all processes page directories, now they
are added only to the page directories of processes with the VM_PINNED
flag. So, if ipipe_test_root tries to access some module memory area
over the context of a non-realtime thread, a fault will occur.

-- 
                                                 Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-12 13:16               ` Gilles Chanteperdrix
@ 2007-02-12 13:46                 ` Philippe Gerum
  2007-02-12 13:49                   ` Jan Kiszka
  0 siblings, 1 reply; 17+ messages in thread
From: Philippe Gerum @ 2007-02-12 13:46 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai-core

On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
> > Jan Kiszka wrote:
> > 
> >>2.6.19 didn't magically start to work as well. Instead I have a back
> >>trace now, see attachment.
> >>
> >>I included a full set of 16k points, but the thrilling things are around
> >>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
> >>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
> >>stack manager, prio 98). But when returning to Linux again, its IRQs
> >>remain masked now. The reason must be that weird exception at -62. Don't
> >>know where it comes from and why is there no report about THAT issue in
> >>the kernel logs.
> > 
> > 
> > The cause of this page fault will get tracked down later today, but the
> > way it is handled already causes some doubts to me. To make discussion
> > easier, here is the relevant excerpt from the trace:
> 
> Maybe this fault is due to the No-cow patch ? Before the no-cow patch,
> vmalloced areas were added to all processes page directories, now they
> are added only to the page directories of processes with the VM_PINNED
> flag. So, if ipipe_test_root tries to access some module memory area
> over the context of a non-realtime thread, a fault will occur.
> 

Yes, it's a minor fault occurring due to on-demand memory mapping, this
is why we don't get any alarming message in the kernel log.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-12 13:46                 ` Philippe Gerum
@ 2007-02-12 13:49                   ` Jan Kiszka
  2007-02-12 14:10                     ` Philippe Gerum
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kiszka @ 2007-02-12 13:49 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 1671 bytes --]

Philippe Gerum wrote:
> On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Jan Kiszka wrote:
>>>
>>>> 2.6.19 didn't magically start to work as well. Instead I have a back
>>>> trace now, see attachment.
>>>>
>>>> I included a full set of 16k points, but the thrilling things are around
>>>> -73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
>>>> (RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
>>>> stack manager, prio 98). But when returning to Linux again, its IRQs
>>>> remain masked now. The reason must be that weird exception at -62. Don't
>>>> know where it comes from and why is there no report about THAT issue in
>>>> the kernel logs.
>>>
>>> The cause of this page fault will get tracked down later today, but the
>>> way it is handled already causes some doubts to me. To make discussion
>>> easier, here is the relevant excerpt from the trace:
>> Maybe this fault is due to the No-cow patch ? Before the no-cow patch,
>> vmalloced areas were added to all processes page directories, now they
>> are added only to the page directories of processes with the VM_PINNED
>> flag. So, if ipipe_test_root tries to access some module memory area
>> over the context of a non-realtime thread, a fault will occur.
>>
> 
> Yes, it's a minor fault occurring due to on-demand memory mapping, this
> is why we don't get any alarming message in the kernel log.
> 

Looks like it's something that should never happen, for sure. But are we
fine with screwing up the Linux IRQ state nevertheless? In other words,
are we seeing one or two ipipe issues here?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-12 13:49                   ` Jan Kiszka
@ 2007-02-12 14:10                     ` Philippe Gerum
  2007-02-12 14:39                       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 17+ messages in thread
From: Philippe Gerum @ 2007-02-12 14:10 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote:
> Philippe Gerum wrote:
> > On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote:
> >> Jan Kiszka wrote:
> >>> Jan Kiszka wrote:
> >>>
> >>>> 2.6.19 didn't magically start to work as well. Instead I have a back
> >>>> trace now, see attachment.
> >>>>
> >>>> I included a full set of 16k points, but the thrilling things are around
> >>>> -73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
> >>>> (RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
> >>>> stack manager, prio 98). But when returning to Linux again, its IRQs
> >>>> remain masked now. The reason must be that weird exception at -62. Don't
> >>>> know where it comes from and why is there no report about THAT issue in
> >>>> the kernel logs.
> >>>
> >>> The cause of this page fault will get tracked down later today, but the
> >>> way it is handled already causes some doubts to me. To make discussion
> >>> easier, here is the relevant excerpt from the trace:
> >> Maybe this fault is due to the No-cow patch ? Before the no-cow patch,
> >> vmalloced areas were added to all processes page directories, now they
> >> are added only to the page directories of processes with the VM_PINNED
> >> flag. So, if ipipe_test_root tries to access some module memory area
> >> over the context of a non-realtime thread, a fault will occur.
> >>
> > 
> > Yes, it's a minor fault occurring due to on-demand memory mapping, this
> > is why we don't get any alarming message in the kernel log.
> > 
> 
> Looks like it's something that should never happen, for sure.

Now that vmalloc & ioremap memory may have their pte set on demand anew
due to the nocow patch, minor faults in kernel space are possible again,
but this should only happen on behalf of the Linux domain, this is not
expected to happen in primary mode.

>  But are we
> fine with screwing up the Linux IRQ state nevertheless? In other words,
> are we seeing one or two ipipe issues here?

The I-pipe would only restore the virtual flag as seen on entry from an
exception on behalf of the Linux domain, not in primary mode.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-12 14:10                     ` Philippe Gerum
@ 2007-02-12 14:39                       ` Gilles Chanteperdrix
  2007-02-12 15:10                         ` Philippe Gerum
  0 siblings, 1 reply; 17+ messages in thread
From: Gilles Chanteperdrix @ 2007-02-12 14:39 UTC (permalink / raw)
  To: rpm; +Cc: Jan Kiszka, xenomai-core

Philippe Gerum wrote:
> On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote:
> 
>>Philippe Gerum wrote:
>>
>>>On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote:
>>>
>>>>Jan Kiszka wrote:
>>>>
>>>>>Jan Kiszka wrote:
>>>>>
>>>>>
>>>>>>2.6.19 didn't magically start to work as well. Instead I have a back
>>>>>>trace now, see attachment.
>>>>>>
>>>>>>I included a full set of 16k points, but the thrilling things are around
>>>>>>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
>>>>>>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
>>>>>>stack manager, prio 98). But when returning to Linux again, its IRQs
>>>>>>remain masked now. The reason must be that weird exception at -62. Don't
>>>>>>know where it comes from and why is there no report about THAT issue in
>>>>>>the kernel logs.
>>>>>
>>>>>The cause of this page fault will get tracked down later today, but the
>>>>>way it is handled already causes some doubts to me. To make discussion
>>>>>easier, here is the relevant excerpt from the trace:
>>>>
>>>>Maybe this fault is due to the No-cow patch ? Before the no-cow patch,
>>>>vmalloced areas were added to all processes page directories, now they
>>>>are added only to the page directories of processes with the VM_PINNED
>>>>flag. So, if ipipe_test_root tries to access some module memory area
>>>>over the context of a non-realtime thread, a fault will occur.
>>>>
>>>
>>>Yes, it's a minor fault occurring due to on-demand memory mapping, this
>>>is why we don't get any alarming message in the kernel log.
>>>
>>
>>Looks like it's something that should never happen, for sure.
> 
> 
> Now that vmalloc & ioremap memory may have their pte set on demand anew
> due to the nocow patch, minor faults in kernel space are possible again,
> but this should only happen on behalf of the Linux domain, this is not
> expected to happen in primary mode.

Does not a primary mode IRQ handler borrow the mmu context from the
tasks it preempts ?

-- 
                                                 Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-12 14:39                       ` Gilles Chanteperdrix
@ 2007-02-12 15:10                         ` Philippe Gerum
  2007-02-12 19:02                           ` Gilles Chanteperdrix
  0 siblings, 1 reply; 17+ messages in thread
From: Philippe Gerum @ 2007-02-12 15:10 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai-core

On Mon, 2007-02-12 at 15:39 +0100, Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
> > On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote:
> > 
> >>Philippe Gerum wrote:
> >>
> >>>On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote:
> >>>
> >>>>Jan Kiszka wrote:
> >>>>
> >>>>>Jan Kiszka wrote:
> >>>>>
> >>>>>
> >>>>>>2.6.19 didn't magically start to work as well. Instead I have a back
> >>>>>>trace now, see attachment.
> >>>>>>
> >>>>>>I included a full set of 16k points, but the thrilling things are around
> >>>>>>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
> >>>>>>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
> >>>>>>stack manager, prio 98). But when returning to Linux again, its IRQs
> >>>>>>remain masked now. The reason must be that weird exception at -62. Don't
> >>>>>>know where it comes from and why is there no report about THAT issue in
> >>>>>>the kernel logs.
> >>>>>
> >>>>>The cause of this page fault will get tracked down later today, but the
> >>>>>way it is handled already causes some doubts to me. To make discussion
> >>>>>easier, here is the relevant excerpt from the trace:
> >>>>
> >>>>Maybe this fault is due to the No-cow patch ? Before the no-cow patch,
> >>>>vmalloced areas were added to all processes page directories, now they
> >>>>are added only to the page directories of processes with the VM_PINNED
> >>>>flag. So, if ipipe_test_root tries to access some module memory area
> >>>>over the context of a non-realtime thread, a fault will occur.
> >>>>
> >>>
> >>>Yes, it's a minor fault occurring due to on-demand memory mapping, this
> >>>is why we don't get any alarming message in the kernel log.
> >>>
> >>
> >>Looks like it's something that should never happen, for sure.
> > 
> > 
> > Now that vmalloc & ioremap memory may have their pte set on demand anew
> > due to the nocow patch, minor faults in kernel space are possible again,
> > but this should only happen on behalf of the Linux domain, this is not
> > expected to happen in primary mode.
> 
> Does not a primary mode IRQ handler borrow the mmu context from the
> tasks it preempts ?
> 

Yes, this is where the problem stands if we happen to preempt a regular
task and tread over code which might trigger minor faults. The best way
to check this would be to somehow enable VM_PINNED for all tasks. Back
to square #1.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state
  2007-02-12 15:10                         ` Philippe Gerum
@ 2007-02-12 19:02                           ` Gilles Chanteperdrix
  0 siblings, 0 replies; 17+ messages in thread
From: Gilles Chanteperdrix @ 2007-02-12 19:02 UTC (permalink / raw)
  To: rpm; +Cc: Jan Kiszka, xenomai-core

Philippe Gerum wrote:
> On Mon, 2007-02-12 at 15:39 +0100, Gilles Chanteperdrix wrote:
> 
>>Philippe Gerum wrote:
>>
>>>On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote:
>>>
>>>
>>>>Philippe Gerum wrote:
>>>>
>>>>
>>>>>On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote:
>>>>>
>>>>>
>>>>>>Jan Kiszka wrote:
>>>>>>
>>>>>>
>>>>>>>Jan Kiszka wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>2.6.19 didn't magically start to work as well. Instead I have a back
>>>>>>>>trace now, see attachment.
>>>>>>>>
>>>>>>>>I included a full set of 16k points, but the thrilling things are around
>>>>>>>>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ
>>>>>>>>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet
>>>>>>>>stack manager, prio 98). But when returning to Linux again, its IRQs
>>>>>>>>remain masked now. The reason must be that weird exception at -62. Don't
>>>>>>>>know where it comes from and why is there no report about THAT issue in
>>>>>>>>the kernel logs.
>>>>>>>
>>>>>>>The cause of this page fault will get tracked down later today, but the
>>>>>>>way it is handled already causes some doubts to me. To make discussion
>>>>>>>easier, here is the relevant excerpt from the trace:
>>>>>>
>>>>>>Maybe this fault is due to the No-cow patch ? Before the no-cow patch,
>>>>>>vmalloced areas were added to all processes page directories, now they
>>>>>>are added only to the page directories of processes with the VM_PINNED
>>>>>>flag. So, if ipipe_test_root tries to access some module memory area
>>>>>>over the context of a non-realtime thread, a fault will occur.
>>>>>>
>>>>>
>>>>>Yes, it's a minor fault occurring due to on-demand memory mapping, this
>>>>>is why we don't get any alarming message in the kernel log.
>>>>>
>>>>
>>>>Looks like it's something that should never happen, for sure.
>>>
>>>
>>>Now that vmalloc & ioremap memory may have their pte set on demand anew
>>>due to the nocow patch, minor faults in kernel space are possible again,
>>>but this should only happen on behalf of the Linux domain, this is not
>>>expected to happen in primary mode.
>>
>>Does not a primary mode IRQ handler borrow the mmu context from the
>>tasks it preempts ?
>>
> 
> 
> Yes, this is where the problem stands if we happen to preempt a regular
> task and tread over code which might trigger minor faults. The best way
> to check this would be to somehow enable VM_PINNED for all tasks. Back
> to square #1.
> 

Ok. I'll try to change this and send a patch ASAP.

-- 
                                                 Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2007-02-12 19:02 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-11 22:13 [Xenomai-core] [BUG] trunk: screwed Linux irq state Jan Kiszka
2007-02-11 22:26 ` Gilles Chanteperdrix
2007-02-11 22:31 ` Gilles Chanteperdrix
2007-02-11 22:42 ` Philippe Gerum
2007-02-11 23:07   ` Gilles Chanteperdrix
2007-02-11 23:49     ` Philippe Gerum
2007-02-12  0:20       ` Gilles Chanteperdrix
2007-02-12  0:28         ` Jan Kiszka
2007-02-12  1:10           ` Jan Kiszka
2007-02-12 11:49             ` Jan Kiszka
2007-02-12 13:16               ` Gilles Chanteperdrix
2007-02-12 13:46                 ` Philippe Gerum
2007-02-12 13:49                   ` Jan Kiszka
2007-02-12 14:10                     ` Philippe Gerum
2007-02-12 14:39                       ` Gilles Chanteperdrix
2007-02-12 15:10                         ` Philippe Gerum
2007-02-12 19:02                           ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.