* [Xenomai-core] [BUG] trunk: screwed Linux irq state @ 2007-02-11 22:13 Jan Kiszka 2007-02-11 22:26 ` Gilles Chanteperdrix ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Jan Kiszka @ 2007-02-11 22:13 UTC (permalink / raw) To: xenomai-core [-- Attachment #1.1: Type: text/plain, Size: 2271 bytes --] Hi, while testing 2.6.20 with RTnet, I got this kernel BUG during the slave startup procedure: <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us) <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() <4>[ 142.291585] [<c0103a8f>] show_trace_log_lvl+0x1f/0x40 <4>[ 142.291767] [<c0104237>] show_trace+0x17/0x20 <4>[ 142.291896] [<c010432b>] dump_stack+0x1b/0x20 <4>[ 142.292026] [<c0111e94>] copy_process+0x914/0x13d0 <4>[ 142.292190] [<c0112b80>] do_fork+0x70/0x1b0 <4>[ 142.292323] [<c0101078>] sys_clone+0x38/0x40 <4>[ 142.292620] [<c010320f>] syscall_call+0x7/0xb <4>[ 142.292747] ======================= <3>[ 142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034 <4>[ 142.293052] in_atomic():0, irqs_disabled():1 <4>[ 142.293152] no locks held by init/1. <4>[ 142.293244] irq event stamp: 500992 <4>[ 142.293335] hardirqs last enabled at (500991): [<c010b77c>] __ipipe_handle_exception+0xdc/0x188 <4>[ 142.293737] hardirqs last disabled at (500992): [<c010306e>] ret_from_exception+0xe/0x20 <4>[ 142.293967] softirqs last enabled at (500868): [<c01189ab>] __do_softirq+0xab/0xc0 <4>[ 142.294189] softirqs last disabled at (500861): [<c0118a55>] do_softirq+0x95/0xa0 <4>[ 142.294562] [<c0103a8f>] show_trace_log_lvl+0x1f/0x40 <4>[ 142.294743] [<c0104237>] show_trace+0x17/0x20 <4>[ 142.294897] [<c010432b>] dump_stack+0x1b/0x20 <4>[ 142.295050] [<c010f35d>] __might_sleep+0xcd/0x100 <4>[ 142.295220] [<c017d771>] kmem_cache_alloc+0xa1/0xc0 <4>[ 142.295527] [<c0110f89>] dup_fd+0x29/0x2d0 <4>[ 142.295689] [<c0111279>] copy_files+0x49/0x70 <4>[ 142.295851] [<c0111c2f>] copy_process+0x6af/0x13d0 <4>[ 142.296019] [<c0112b80>] do_fork+0x70/0x1b0 <4>[ 142.296178] [<c0101078>] sys_clone+0x38/0x40 <4>[ 142.296326] [<c010320f>] syscall_call+0x7/0xb <4>[ 142.296641] ======================= I'm seeing this with Xenomai trunk + the xnpod_suspend_thread patch. I attached my .config. The interesting thing is that this doesn't show up with v2.3.x head (kernel & config identical). Switching back to 2.6.19 doesn't change the picture. Anyone any idea? No-COW is now both in trunk and 2.3.x, right? Jan [-- Attachment #1.2: config.bz2 --] [-- Type: application/x-bzip, Size: 7936 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 249 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-11 22:13 [Xenomai-core] [BUG] trunk: screwed Linux irq state Jan Kiszka @ 2007-02-11 22:26 ` Gilles Chanteperdrix 2007-02-11 22:31 ` Gilles Chanteperdrix 2007-02-11 22:42 ` Philippe Gerum 2 siblings, 0 replies; 17+ messages in thread From: Gilles Chanteperdrix @ 2007-02-11 22:26 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Hi, > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave > startup procedure: > > <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us) > <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() > <4>[ 142.291585] [<c0103a8f>] show_trace_log_lvl+0x1f/0x40 > <4>[ 142.291767] [<c0104237>] show_trace+0x17/0x20 > <4>[ 142.291896] [<c010432b>] dump_stack+0x1b/0x20 > <4>[ 142.292026] [<c0111e94>] copy_process+0x914/0x13d0 > <4>[ 142.292190] [<c0112b80>] do_fork+0x70/0x1b0 > <4>[ 142.292323] [<c0101078>] sys_clone+0x38/0x40 > <4>[ 142.292620] [<c010320f>] syscall_call+0x7/0xb > <4>[ 142.292747] ======================= > <3>[ 142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034 > <4>[ 142.293052] in_atomic():0, irqs_disabled():1 > <4>[ 142.293152] no locks held by init/1. > <4>[ 142.293244] irq event stamp: 500992 > <4>[ 142.293335] hardirqs last enabled at (500991): [<c010b77c>] __ipipe_handle_exception+0xdc/0x188 > <4>[ 142.293737] hardirqs last disabled at (500992): [<c010306e>] ret_from_exception+0xe/0x20 > <4>[ 142.293967] softirqs last enabled at (500868): [<c01189ab>] __do_softirq+0xab/0xc0 > <4>[ 142.294189] softirqs last disabled at (500861): [<c0118a55>] do_softirq+0x95/0xa0 > <4>[ 142.294562] [<c0103a8f>] show_trace_log_lvl+0x1f/0x40 > <4>[ 142.294743] [<c0104237>] show_trace+0x17/0x20 > <4>[ 142.294897] [<c010432b>] dump_stack+0x1b/0x20 > <4>[ 142.295050] [<c010f35d>] __might_sleep+0xcd/0x100 > <4>[ 142.295220] [<c017d771>] kmem_cache_alloc+0xa1/0xc0 > <4>[ 142.295527] [<c0110f89>] dup_fd+0x29/0x2d0 > <4>[ 142.295689] [<c0111279>] copy_files+0x49/0x70 > <4>[ 142.295851] [<c0111c2f>] copy_process+0x6af/0x13d0 > <4>[ 142.296019] [<c0112b80>] do_fork+0x70/0x1b0 > <4>[ 142.296178] [<c0101078>] sys_clone+0x38/0x40 > <4>[ 142.296326] [<c010320f>] syscall_call+0x7/0xb > <4>[ 142.296641] ======================= > > I'm seeing this with Xenomai trunk + the xnpod_suspend_thread patch. I > attached my .config. The interesting thing is that this doesn't show up > with v2.3.x head (kernel & config identical). Switching back to 2.6.19 > doesn't change the picture. > > Anyone any idea? No-COW is now both in trunk and 2.3.x, right? In order to see if it is the effect of the no-cow patch, comment out the I-pipe portion in copy_one_pte, in mm/memory.c. This code should have an effect only if you are forking in a real-time application, though. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-11 22:13 [Xenomai-core] [BUG] trunk: screwed Linux irq state Jan Kiszka 2007-02-11 22:26 ` Gilles Chanteperdrix @ 2007-02-11 22:31 ` Gilles Chanteperdrix 2007-02-11 22:42 ` Philippe Gerum 2 siblings, 0 replies; 17+ messages in thread From: Gilles Chanteperdrix @ 2007-02-11 22:31 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Hi, > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave > startup procedure: > > <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us) > <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() > <4>[ 142.291585] [<c0103a8f>] show_trace_log_lvl+0x1f/0x40 > <4>[ 142.291767] [<c0104237>] show_trace+0x17/0x20 > <4>[ 142.291896] [<c010432b>] dump_stack+0x1b/0x20 > <4>[ 142.292026] [<c0111e94>] copy_process+0x914/0x13d0 > <4>[ 142.292190] [<c0112b80>] do_fork+0x70/0x1b0 > <4>[ 142.292323] [<c0101078>] sys_clone+0x38/0x40 > <4>[ 142.292620] [<c010320f>] syscall_call+0x7/0xb > <4>[ 142.292747] ======================= > <3>[ 142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034 > <4>[ 142.293052] in_atomic():0, irqs_disabled():1 > <4>[ 142.293152] no locks held by init/1. > <4>[ 142.293244] irq event stamp: 500992 > <4>[ 142.293335] hardirqs last enabled at (500991): [<c010b77c>] __ipipe_handle_exception+0xdc/0x188 > <4>[ 142.293737] hardirqs last disabled at (500992): [<c010306e>] ret_from_exception+0xe/0x20 > <4>[ 142.293967] softirqs last enabled at (500868): [<c01189ab>] __do_softirq+0xab/0xc0 > <4>[ 142.294189] softirqs last disabled at (500861): [<c0118a55>] do_softirq+0x95/0xa0 > <4>[ 142.294562] [<c0103a8f>] show_trace_log_lvl+0x1f/0x40 > <4>[ 142.294743] [<c0104237>] show_trace+0x17/0x20 > <4>[ 142.294897] [<c010432b>] dump_stack+0x1b/0x20 > <4>[ 142.295050] [<c010f35d>] __might_sleep+0xcd/0x100 > <4>[ 142.295220] [<c017d771>] kmem_cache_alloc+0xa1/0xc0 > <4>[ 142.295527] [<c0110f89>] dup_fd+0x29/0x2d0 > <4>[ 142.295689] [<c0111279>] copy_files+0x49/0x70 > <4>[ 142.295851] [<c0111c2f>] copy_process+0x6af/0x13d0 > <4>[ 142.296019] [<c0112b80>] do_fork+0x70/0x1b0 > <4>[ 142.296178] [<c0101078>] sys_clone+0x38/0x40 > <4>[ 142.296326] [<c010320f>] syscall_call+0x7/0xb > <4>[ 142.296641] ======================= > > I'm seeing this with Xenomai trunk + the xnpod_suspend_thread patch. I > attached my .config. The interesting thing is that this doesn't show up > with v2.3.x head (kernel & config identical). Switching back to 2.6.19 > doesn't change the picture. > > Anyone any idea? No-COW is now both in trunk and 2.3.x, right? alloc_page_vma in copy_one_pte uses the GFP_HIGHUSER flag, whereas the bug suggests we are in an atomic section, so maybe we should use GFP_ATOMIC | __GFP_HIGHMEM -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-11 22:13 [Xenomai-core] [BUG] trunk: screwed Linux irq state Jan Kiszka 2007-02-11 22:26 ` Gilles Chanteperdrix 2007-02-11 22:31 ` Gilles Chanteperdrix @ 2007-02-11 22:42 ` Philippe Gerum 2007-02-11 23:07 ` Gilles Chanteperdrix 2 siblings, 1 reply; 17+ messages in thread From: Philippe Gerum @ 2007-02-11 22:42 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote: > Hi, > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave > startup procedure: > > <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us) > <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() > <4>[ 142.291585] [<c0103a8f>] show_trace_log_lvl+0x1f/0x40 > <4>[ 142.291767] [<c0104237>] show_trace+0x17/0x20 > <4>[ 142.291896] [<c010432b>] dump_stack+0x1b/0x20 > <4>[ 142.292026] [<c0111e94>] copy_process+0x914/0x13d0 > <4>[ 142.292190] [<c0112b80>] do_fork+0x70/0x1b0 > <4>[ 142.292323] [<c0101078>] sys_clone+0x38/0x40 > <4>[ 142.292620] [<c010320f>] syscall_call+0x7/0xb > <4>[ 142.292747] ======================= > <3>[ 142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034 > <4>[ 142.293052] in_atomic():0, irqs_disabled():1 ^^^^ Typical of something going wrong in entry.S. > <4>[ 142.293152] no locks held by init/1. > <4>[ 142.293244] irq event stamp: 500992 > <4>[ 142.293335] hardirqs last enabled at (500991): [<c010b77c>] __ipipe_handle_exception+0xdc/0x188 > <4>[ 142.293737] hardirqs last disabled at (500992): [<c010306e>] ret_from_exception+0xe/0x20 > <4>[ 142.293967] softirqs last enabled at (500868): [<c01189ab>] __do_softirq+0xab/0xc0 > <4>[ 142.294189] softirqs last disabled at (500861): [<c0118a55>] do_softirq+0x95/0xa0 > <4>[ 142.294562] [<c0103a8f>] show_trace_log_lvl+0x1f/0x40 > <4>[ 142.294743] [<c0104237>] show_trace+0x17/0x20 > <4>[ 142.294897] [<c010432b>] dump_stack+0x1b/0x20 > <4>[ 142.295050] [<c010f35d>] __might_sleep+0xcd/0x100 > <4>[ 142.295220] [<c017d771>] kmem_cache_alloc+0xa1/0xc0 > <4>[ 142.295527] [<c0110f89>] dup_fd+0x29/0x2d0 > <4>[ 142.295689] [<c0111279>] copy_files+0x49/0x70 > <4>[ 142.295851] [<c0111c2f>] copy_process+0x6af/0x13d0 > <4>[ 142.296019] [<c0112b80>] do_fork+0x70/0x1b0 > <4>[ 142.296178] [<c0101078>] sys_clone+0x38/0x40 > <4>[ 142.296326] [<c010320f>] syscall_call+0x7/0xb > <4>[ 142.296641] ======================= > > I'm seeing this with Xenomai trunk + the xnpod_suspend_thread patch. I > attached my .config. The interesting thing is that this doesn't show up > with v2.3.x head (kernel & config identical). Switching back to 2.6.19 > doesn't change the picture. > Could you disable the tracer and remove the xnpod_suspend_thread() patch, then downgrade to 1.6-04 to remove the COW support? TIA, > Anyone any idea? No-COW is now both in trunk and 2.3.x, right? > Only with the I-pipe patches from the 1.7 series on x86. > Jan > > _______________________________________________ > Xenomai-core mailing list > Xenomai-core@domain.hid > https://mail.gna.org/listinfo/xenomai-core -- Philippe. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-11 22:42 ` Philippe Gerum @ 2007-02-11 23:07 ` Gilles Chanteperdrix 2007-02-11 23:49 ` Philippe Gerum 0 siblings, 1 reply; 17+ messages in thread From: Gilles Chanteperdrix @ 2007-02-11 23:07 UTC (permalink / raw) To: rpm; +Cc: Jan Kiszka, xenomai-core Philippe Gerum wrote: > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote: > > Hi, > > > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave > > startup procedure: > > > > <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us) > > <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() > > <4>[ 142.291585] [<c0103a8f>] show_trace_log_lvl+0x1f/0x40 > > <4>[ 142.291767] [<c0104237>] show_trace+0x17/0x20 > > <4>[ 142.291896] [<c010432b>] dump_stack+0x1b/0x20 > > <4>[ 142.292026] [<c0111e94>] copy_process+0x914/0x13d0 > > <4>[ 142.292190] [<c0112b80>] do_fork+0x70/0x1b0 > > <4>[ 142.292323] [<c0101078>] sys_clone+0x38/0x40 > > <4>[ 142.292620] [<c010320f>] syscall_call+0x7/0xb > > <4>[ 142.292747] ======================= > > <3>[ 142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034 > > <4>[ 142.293052] in_atomic():0, irqs_disabled():1 > ^^^^ > > Typical of something going wrong in entry.S. You mean, interrupts are not really disabled when forking ? :-) So, I am afraid the new fpu_counter optimization is buggy: if a task forks with fpu_counter greater than 5 and is preempted right after prepare_to_copy in dup_task_struct, when the system switches back to this task, the task FPU context will be restored and TS_USEDFPU set in the task flags, thereby voiding the effect of prepare_to_copy. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-11 23:07 ` Gilles Chanteperdrix @ 2007-02-11 23:49 ` Philippe Gerum 2007-02-12 0:20 ` Gilles Chanteperdrix 0 siblings, 1 reply; 17+ messages in thread From: Philippe Gerum @ 2007-02-11 23:49 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai-core On Mon, 2007-02-12 at 00:07 +0100, Gilles Chanteperdrix wrote: > Philippe Gerum wrote: > > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote: > > > Hi, > > > > > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave > > > startup procedure: > > > > > > <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us) > > > <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() > > > <4>[ 142.291585] [<c0103a8f>] show_trace_log_lvl+0x1f/0x40 > > > <4>[ 142.291767] [<c0104237>] show_trace+0x17/0x20 > > > <4>[ 142.291896] [<c010432b>] dump_stack+0x1b/0x20 > > > <4>[ 142.292026] [<c0111e94>] copy_process+0x914/0x13d0 > > > <4>[ 142.292190] [<c0112b80>] do_fork+0x70/0x1b0 > > > <4>[ 142.292323] [<c0101078>] sys_clone+0x38/0x40 > > > <4>[ 142.292620] [<c010320f>] syscall_call+0x7/0xb > > > <4>[ 142.292747] ======================= > > > <3>[ 142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034 > > > <4>[ 142.293052] in_atomic():0, irqs_disabled():1 > > ^^^^ > > > > Typical of something going wrong in entry.S. > > You mean, interrupts are not really disabled when forking ? :-) > Eh, mmmh, no. Hopefully. > So, I am afraid the new fpu_counter optimization is buggy: if a task > forks with fpu_counter greater than 5 and is preempted right after > prepare_to_copy in dup_task_struct, when the system switches back to > this task, the task FPU context will be restored and TS_USEDFPU set in > the task flags, thereby voiding the effect of prepare_to_copy. > You mean that the parent FPU context would leak into the child's one? Well, maybe the LKML people would like to know about this. As a sidenote, I don't see anything bad with your latest counter-measure disabling this optimization in Xenomai's context switch code, even in the bugous case above. Right? -- Philippe. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-11 23:49 ` Philippe Gerum @ 2007-02-12 0:20 ` Gilles Chanteperdrix 2007-02-12 0:28 ` Jan Kiszka 0 siblings, 1 reply; 17+ messages in thread From: Gilles Chanteperdrix @ 2007-02-12 0:20 UTC (permalink / raw) To: rpm; +Cc: Jan Kiszka, xenomai-core Philippe Gerum wrote: > On Mon, 2007-02-12 at 00:07 +0100, Gilles Chanteperdrix wrote: > > Philippe Gerum wrote: > > > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote: > > > > Hi, > > > > > > > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave > > > > startup procedure: > > > > > > > > <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us) > > > > <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() > > > > <4>[ 142.291585] [<c0103a8f>] show_trace_log_lvl+0x1f/0x40 > > > > <4>[ 142.291767] [<c0104237>] show_trace+0x17/0x20 > > > > <4>[ 142.291896] [<c010432b>] dump_stack+0x1b/0x20 > > > > <4>[ 142.292026] [<c0111e94>] copy_process+0x914/0x13d0 > > > > <4>[ 142.292190] [<c0112b80>] do_fork+0x70/0x1b0 > > > > <4>[ 142.292323] [<c0101078>] sys_clone+0x38/0x40 > > > > <4>[ 142.292620] [<c010320f>] syscall_call+0x7/0xb > > > > <4>[ 142.292747] ======================= > > > > <3>[ 142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034 > > > > <4>[ 142.293052] in_atomic():0, irqs_disabled():1 > > > ^^^^ > > > > > > Typical of something going wrong in entry.S. > > > > You mean, interrupts are not really disabled when forking ? :-) > > > > Eh, mmmh, no. Hopefully. > > > So, I am afraid the new fpu_counter optimization is buggy: if a task > > forks with fpu_counter greater than 5 and is preempted right after > > prepare_to_copy in dup_task_struct, when the system switches back to > > this task, the task FPU context will be restored and TS_USEDFPU set in > > the task flags, thereby voiding the effect of prepare_to_copy. > > > > You mean that the parent FPU context would leak into the child's one? Yes, something like that. The result is random segfaults, I do not remember exactly why. > Well, maybe the LKML people would like to know about this. As a > sidenote, I don't see anything bad with your latest counter-measure > disabling this optimization in Xenomai's context switch code, even in > the bugous case above. Right? Right, if there are random segfaults, they will not be xenomai's fault. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-12 0:20 ` Gilles Chanteperdrix @ 2007-02-12 0:28 ` Jan Kiszka 2007-02-12 1:10 ` Jan Kiszka 0 siblings, 1 reply; 17+ messages in thread From: Jan Kiszka @ 2007-02-12 0:28 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 2706 bytes --] Gilles Chanteperdrix wrote: > Philippe Gerum wrote: > > On Mon, 2007-02-12 at 00:07 +0100, Gilles Chanteperdrix wrote: > > > Philippe Gerum wrote: > > > > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote: > > > > > Hi, > > > > > > > > > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave > > > > > startup procedure: > > > > > > > > > > <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us) > > > > > <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() > > > > > <4>[ 142.291585] [<c0103a8f>] show_trace_log_lvl+0x1f/0x40 > > > > > <4>[ 142.291767] [<c0104237>] show_trace+0x17/0x20 > > > > > <4>[ 142.291896] [<c010432b>] dump_stack+0x1b/0x20 > > > > > <4>[ 142.292026] [<c0111e94>] copy_process+0x914/0x13d0 > > > > > <4>[ 142.292190] [<c0112b80>] do_fork+0x70/0x1b0 > > > > > <4>[ 142.292323] [<c0101078>] sys_clone+0x38/0x40 > > > > > <4>[ 142.292620] [<c010320f>] syscall_call+0x7/0xb > > > > > <4>[ 142.292747] ======================= > > > > > <3>[ 142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034 > > > > > <4>[ 142.293052] in_atomic():0, irqs_disabled():1 > > > > ^^^^ > > > > > > > > Typical of something going wrong in entry.S. > > > > > > You mean, interrupts are not really disabled when forking ? :-) > > > > > > > Eh, mmmh, no. Hopefully. > > > > > So, I am afraid the new fpu_counter optimization is buggy: if a task > > > forks with fpu_counter greater than 5 and is preempted right after > > > prepare_to_copy in dup_task_struct, when the system switches back to > > > this task, the task FPU context will be restored and TS_USEDFPU set in > > > the task flags, thereby voiding the effect of prepare_to_copy. > > > > > > > You mean that the parent FPU context would leak into the child's one? > > Yes, something like that. The result is random segfaults, I do not > remember exactly why. > > > Well, maybe the LKML people would like to know about this. As a > > sidenote, I don't see anything bad with your latest counter-measure > > disabling this optimization in Xenomai's context switch code, even in > > the bugous case above. Right? > > Right, if there are random segfaults, they will not be xenomai's fault. > I'm currently sorting the symptoms again, or better I'm looking where they went to. 2.6.20 just decided to work normally again, 2.6.19 needs a re-check. It appears now that the tracer played an important role, but I'm not 100% sure yet. I'll keep you posted. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 249 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-12 0:28 ` Jan Kiszka @ 2007-02-12 1:10 ` Jan Kiszka 2007-02-12 11:49 ` Jan Kiszka 0 siblings, 1 reply; 17+ messages in thread From: Jan Kiszka @ 2007-02-12 1:10 UTC (permalink / raw) To: Philippe Gerum, Gilles Chanteperdrix; +Cc: xenomai-core [-- Attachment #1.1: Type: text/plain, Size: 3367 bytes --] Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Philippe Gerum wrote: >> > On Mon, 2007-02-12 at 00:07 +0100, Gilles Chanteperdrix wrote: >> > > Philippe Gerum wrote: >> > > > On Sun, 2007-02-11 at 23:13 +0100, Jan Kiszka wrote: >> > > > > Hi, >> > > > > >> > > > > while testing 2.6.20 with RTnet, I got this kernel BUG during the slave >> > > > > startup procedure: >> > > > > >> > > > > <4>[ 137.799234] TDMA: calibrated master-to-slave packet delay: 34 us (min/max: 33/38 us) >> > > > > <4>[ 142.291455] BUG: at kernel/fork.c:993 copy_process() >> > > > > <4>[ 142.291585] [<c0103a8f>] show_trace_log_lvl+0x1f/0x40 >> > > > > <4>[ 142.291767] [<c0104237>] show_trace+0x17/0x20 >> > > > > <4>[ 142.291896] [<c010432b>] dump_stack+0x1b/0x20 >> > > > > <4>[ 142.292026] [<c0111e94>] copy_process+0x914/0x13d0 >> > > > > <4>[ 142.292190] [<c0112b80>] do_fork+0x70/0x1b0 >> > > > > <4>[ 142.292323] [<c0101078>] sys_clone+0x38/0x40 >> > > > > <4>[ 142.292620] [<c010320f>] syscall_call+0x7/0xb >> > > > > <4>[ 142.292747] ======================= >> > > > > <3>[ 142.292860] BUG: sleeping function called from invalid context at mm/slab.c:3034 >> > > > > <4>[ 142.293052] in_atomic():0, irqs_disabled():1 >> > > > ^^^^ >> > > > >> > > > Typical of something going wrong in entry.S. >> > > >> > > You mean, interrupts are not really disabled when forking ? :-) >> > > >> > >> > Eh, mmmh, no. Hopefully. >> > >> > > So, I am afraid the new fpu_counter optimization is buggy: if a task >> > > forks with fpu_counter greater than 5 and is preempted right after >> > > prepare_to_copy in dup_task_struct, when the system switches back to >> > > this task, the task FPU context will be restored and TS_USEDFPU set in >> > > the task flags, thereby voiding the effect of prepare_to_copy. >> > > >> > >> > You mean that the parent FPU context would leak into the child's one? >> >> Yes, something like that. The result is random segfaults, I do not >> remember exactly why. >> >> > Well, maybe the LKML people would like to know about this. As a >> > sidenote, I don't see anything bad with your latest counter-measure >> > disabling this optimization in Xenomai's context switch code, even in >> > the bugous case above. Right? >> >> Right, if there are random segfaults, they will not be xenomai's fault. >> > > I'm currently sorting the symptoms again, or better I'm looking where > they went to. 2.6.20 just decided to work normally again, 2.6.19 needs a > re-check. > > It appears now that the tracer played an important role, but I'm not > 100% sure yet. I'll keep you posted. 2.6.19 didn't magically start to work as well. Instead I have a back trace now, see attachment. I included a full set of 16k points, but the thrilling things are around -73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ (RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet stack manager, prio 98). But when returning to Linux again, its IRQs remain masked now. The reason must be that weird exception at -62. Don't know where it comes from and why is there no report about THAT issue in the kernel logs. OK, enough for today. Jan [-- Attachment #1.2: back-trace.bz2 --] [-- Type: application/x-bzip, Size: 70700 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 249 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-12 1:10 ` Jan Kiszka @ 2007-02-12 11:49 ` Jan Kiszka 2007-02-12 13:16 ` Gilles Chanteperdrix 0 siblings, 1 reply; 17+ messages in thread From: Jan Kiszka @ 2007-02-12 11:49 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 3640 bytes --] Jan Kiszka wrote: > 2.6.19 didn't magically start to work as well. Instead I have a back > trace now, see attachment. > > I included a full set of 16k points, but the thrilling things are around > -73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ > (RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet > stack manager, prio 98). But when returning to Linux again, its IRQs > remain masked now. The reason must be that weird exception at -62. Don't > know where it comes from and why is there no report about THAT issue in > the kernel logs. The cause of this page fault will get tracked down later today, but the way it is handled already causes some doubts to me. To make discussion easier, here is the relevant excerpt from the trace: > : +func -73+ 1.426 link_path_walk+0x14 (__link_path_walk+0xca0) > :| +func -72 0.605 __ipipe_handle_irq+0x14 (common_interrupt+0x18) > :| +func -71 0.472 __ipipe_ack_irq+0x8 (__ipipe_handle_irq+0xaf) > :| +func -70 0.224 __ipipe_ack_level_irq+0x12 (__ipipe_ack_irq+0x19) > :| +func -70+ 4.424 mask_and_ack_8259A+0x14 (__ipipe_ack_level_irq+0x22) > :| +func -66 0.475 __ipipe_dispatch_wired+0x14 (__ipipe_handle_irq+0x62) > :| # func -65 0.974 xnintr_irq_handler+0xe (__ipipe_dispatch_wired+0x95) > :| # func -64+ 1.892 rtl8139_interrupt+0x11 [rt_8139too] (xnintr_irq_handler+0x3b) > :| # func -62 0.382 __ipipe_handle_exception+0xe (error_code+0x3e) > :| # func -62 0.222 __ipipe_test_root+0x8 (__ipipe_handle_exception+0x1a) > :| # func -62 0.377 __ipipe_stall_root+0x8 (__ipipe_handle_exception+0x15b) > :| #*func -62 0.173 trace_hardirqs_off+0xc (__ipipe_handle_exception+0x165) > :| #*func -61 0.211 __ipipe_test_root+0x8 (trace_hardirqs_off+0x2d) > :| #*func -61+ 1.965 do_page_fault+0xe (__ipipe_handle_exception+0x6d) > : #*func -59 0.180 trace_hardirqs_on+0x11 (__ipipe_handle_exception+0xd9) > : #*func -59 0.163 __ipipe_test_root+0x8 (trace_hardirqs_on+0x5e) > : #*func -59 0.396 mark_held_locks+0xe (trace_hardirqs_on+0x8b) > : #*func -58 0.212 mark_held_locks+0xe (trace_hardirqs_on+0xc9) > : #*func -58 0.461 __ipipe_restore_root+0x8 (__ipipe_handle_exception+0xe1) > : #*func -58 0.253 __ipipe_unstall_root+0x8 (__ipipe_restore_root+0x18) > : # func -57 0.224 __ipipe_stall_root+0x8 (ret_from_exception+0x5) > : #*func -57 0.366 trace_hardirqs_off+0xc (ret_from_exception+0xe) > : #*func -57 0.327 __ipipe_test_root+0x8 (trace_hardirqs_off+0x2d) > :| #*func -57+ 2.089 __ipipe_unstall_iret_root+0x8 (restore_nocheck_notrace+0x0) > :| #*func -54+ 1.444 alloc_rtskb+0xa [rtnet] (rtl8139_interrupt+0x182 [rt_8139too]) > :| #*func -53+ 1.172 rt_eth_type_trans+0xe [rtnet] (rtl8139_interrupt+0x1d6 [rt_8139too]) The fault gets forwarded to Linux because ipipe_trap_notify doesn't choke: we are neither running over a task with PF_EVNOTIFY set nor over a kernel thread yet (IPIPE_NOSTACK_FLAG). Still, we are already in primary domain, so I wonder if this forwarding is intentional. At least it seems to break some things later on... Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-12 11:49 ` Jan Kiszka @ 2007-02-12 13:16 ` Gilles Chanteperdrix 2007-02-12 13:46 ` Philippe Gerum 0 siblings, 1 reply; 17+ messages in thread From: Gilles Chanteperdrix @ 2007-02-12 13:16 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Jan Kiszka wrote: > >>2.6.19 didn't magically start to work as well. Instead I have a back >>trace now, see attachment. >> >>I included a full set of 16k points, but the thrilling things are around >>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ >>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet >>stack manager, prio 98). But when returning to Linux again, its IRQs >>remain masked now. The reason must be that weird exception at -62. Don't >>know where it comes from and why is there no report about THAT issue in >>the kernel logs. > > > The cause of this page fault will get tracked down later today, but the > way it is handled already causes some doubts to me. To make discussion > easier, here is the relevant excerpt from the trace: Maybe this fault is due to the No-cow patch ? Before the no-cow patch, vmalloced areas were added to all processes page directories, now they are added only to the page directories of processes with the VM_PINNED flag. So, if ipipe_test_root tries to access some module memory area over the context of a non-realtime thread, a fault will occur. -- Gilles Chanteperdrix ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-12 13:16 ` Gilles Chanteperdrix @ 2007-02-12 13:46 ` Philippe Gerum 2007-02-12 13:49 ` Jan Kiszka 0 siblings, 1 reply; 17+ messages in thread From: Philippe Gerum @ 2007-02-12 13:46 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai-core On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > Jan Kiszka wrote: > > > >>2.6.19 didn't magically start to work as well. Instead I have a back > >>trace now, see attachment. > >> > >>I included a full set of 16k points, but the thrilling things are around > >>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ > >>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet > >>stack manager, prio 98). But when returning to Linux again, its IRQs > >>remain masked now. The reason must be that weird exception at -62. Don't > >>know where it comes from and why is there no report about THAT issue in > >>the kernel logs. > > > > > > The cause of this page fault will get tracked down later today, but the > > way it is handled already causes some doubts to me. To make discussion > > easier, here is the relevant excerpt from the trace: > > Maybe this fault is due to the No-cow patch ? Before the no-cow patch, > vmalloced areas were added to all processes page directories, now they > are added only to the page directories of processes with the VM_PINNED > flag. So, if ipipe_test_root tries to access some module memory area > over the context of a non-realtime thread, a fault will occur. > Yes, it's a minor fault occurring due to on-demand memory mapping, this is why we don't get any alarming message in the kernel log. -- Philippe. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-12 13:46 ` Philippe Gerum @ 2007-02-12 13:49 ` Jan Kiszka 2007-02-12 14:10 ` Philippe Gerum 0 siblings, 1 reply; 17+ messages in thread From: Jan Kiszka @ 2007-02-12 13:49 UTC (permalink / raw) To: rpm; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 1671 bytes --] Philippe Gerum wrote: > On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Jan Kiszka wrote: >>> >>>> 2.6.19 didn't magically start to work as well. Instead I have a back >>>> trace now, see attachment. >>>> >>>> I included a full set of 16k points, but the thrilling things are around >>>> -73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ >>>> (RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet >>>> stack manager, prio 98). But when returning to Linux again, its IRQs >>>> remain masked now. The reason must be that weird exception at -62. Don't >>>> know where it comes from and why is there no report about THAT issue in >>>> the kernel logs. >>> >>> The cause of this page fault will get tracked down later today, but the >>> way it is handled already causes some doubts to me. To make discussion >>> easier, here is the relevant excerpt from the trace: >> Maybe this fault is due to the No-cow patch ? Before the no-cow patch, >> vmalloced areas were added to all processes page directories, now they >> are added only to the page directories of processes with the VM_PINNED >> flag. So, if ipipe_test_root tries to access some module memory area >> over the context of a non-realtime thread, a fault will occur. >> > > Yes, it's a minor fault occurring due to on-demand memory mapping, this > is why we don't get any alarming message in the kernel log. > Looks like it's something that should never happen, for sure. But are we fine with screwing up the Linux IRQ state nevertheless? In other words, are we seeing one or two ipipe issues here? [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-12 13:49 ` Jan Kiszka @ 2007-02-12 14:10 ` Philippe Gerum 2007-02-12 14:39 ` Gilles Chanteperdrix 0 siblings, 1 reply; 17+ messages in thread From: Philippe Gerum @ 2007-02-12 14:10 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote: > Philippe Gerum wrote: > > On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote: > >> Jan Kiszka wrote: > >>> Jan Kiszka wrote: > >>> > >>>> 2.6.19 didn't magically start to work as well. Instead I have a back > >>>> trace now, see attachment. > >>>> > >>>> I included a full set of 16k points, but the thrilling things are around > >>>> -73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ > >>>> (RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet > >>>> stack manager, prio 98). But when returning to Linux again, its IRQs > >>>> remain masked now. The reason must be that weird exception at -62. Don't > >>>> know where it comes from and why is there no report about THAT issue in > >>>> the kernel logs. > >>> > >>> The cause of this page fault will get tracked down later today, but the > >>> way it is handled already causes some doubts to me. To make discussion > >>> easier, here is the relevant excerpt from the trace: > >> Maybe this fault is due to the No-cow patch ? Before the no-cow patch, > >> vmalloced areas were added to all processes page directories, now they > >> are added only to the page directories of processes with the VM_PINNED > >> flag. So, if ipipe_test_root tries to access some module memory area > >> over the context of a non-realtime thread, a fault will occur. > >> > > > > Yes, it's a minor fault occurring due to on-demand memory mapping, this > > is why we don't get any alarming message in the kernel log. > > > > Looks like it's something that should never happen, for sure. Now that vmalloc & ioremap memory may have their pte set on demand anew due to the nocow patch, minor faults in kernel space are possible again, but this should only happen on behalf of the Linux domain, this is not expected to happen in primary mode. > But are we > fine with screwing up the Linux IRQ state nevertheless? In other words, > are we seeing one or two ipipe issues here? The I-pipe would only restore the virtual flag as seen on entry from an exception on behalf of the Linux domain, not in primary mode. -- Philippe. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-12 14:10 ` Philippe Gerum @ 2007-02-12 14:39 ` Gilles Chanteperdrix 2007-02-12 15:10 ` Philippe Gerum 0 siblings, 1 reply; 17+ messages in thread From: Gilles Chanteperdrix @ 2007-02-12 14:39 UTC (permalink / raw) To: rpm; +Cc: Jan Kiszka, xenomai-core Philippe Gerum wrote: > On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote: > >>Philippe Gerum wrote: >> >>>On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote: >>> >>>>Jan Kiszka wrote: >>>> >>>>>Jan Kiszka wrote: >>>>> >>>>> >>>>>>2.6.19 didn't magically start to work as well. Instead I have a back >>>>>>trace now, see attachment. >>>>>> >>>>>>I included a full set of 16k points, but the thrilling things are around >>>>>>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ >>>>>>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet >>>>>>stack manager, prio 98). But when returning to Linux again, its IRQs >>>>>>remain masked now. The reason must be that weird exception at -62. Don't >>>>>>know where it comes from and why is there no report about THAT issue in >>>>>>the kernel logs. >>>>> >>>>>The cause of this page fault will get tracked down later today, but the >>>>>way it is handled already causes some doubts to me. To make discussion >>>>>easier, here is the relevant excerpt from the trace: >>>> >>>>Maybe this fault is due to the No-cow patch ? Before the no-cow patch, >>>>vmalloced areas were added to all processes page directories, now they >>>>are added only to the page directories of processes with the VM_PINNED >>>>flag. So, if ipipe_test_root tries to access some module memory area >>>>over the context of a non-realtime thread, a fault will occur. >>>> >>> >>>Yes, it's a minor fault occurring due to on-demand memory mapping, this >>>is why we don't get any alarming message in the kernel log. >>> >> >>Looks like it's something that should never happen, for sure. > > > Now that vmalloc & ioremap memory may have their pte set on demand anew > due to the nocow patch, minor faults in kernel space are possible again, > but this should only happen on behalf of the Linux domain, this is not > expected to happen in primary mode. Does not a primary mode IRQ handler borrow the mmu context from the tasks it preempts ? -- Gilles Chanteperdrix ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-12 14:39 ` Gilles Chanteperdrix @ 2007-02-12 15:10 ` Philippe Gerum 2007-02-12 19:02 ` Gilles Chanteperdrix 0 siblings, 1 reply; 17+ messages in thread From: Philippe Gerum @ 2007-02-12 15:10 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai-core On Mon, 2007-02-12 at 15:39 +0100, Gilles Chanteperdrix wrote: > Philippe Gerum wrote: > > On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote: > > > >>Philippe Gerum wrote: > >> > >>>On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote: > >>> > >>>>Jan Kiszka wrote: > >>>> > >>>>>Jan Kiszka wrote: > >>>>> > >>>>> > >>>>>>2.6.19 didn't magically start to work as well. Instead I have a back > >>>>>>trace now, see attachment. > >>>>>> > >>>>>>I included a full set of 16k points, but the thrilling things are around > >>>>>>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ > >>>>>>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet > >>>>>>stack manager, prio 98). But when returning to Linux again, its IRQs > >>>>>>remain masked now. The reason must be that weird exception at -62. Don't > >>>>>>know where it comes from and why is there no report about THAT issue in > >>>>>>the kernel logs. > >>>>> > >>>>>The cause of this page fault will get tracked down later today, but the > >>>>>way it is handled already causes some doubts to me. To make discussion > >>>>>easier, here is the relevant excerpt from the trace: > >>>> > >>>>Maybe this fault is due to the No-cow patch ? Before the no-cow patch, > >>>>vmalloced areas were added to all processes page directories, now they > >>>>are added only to the page directories of processes with the VM_PINNED > >>>>flag. So, if ipipe_test_root tries to access some module memory area > >>>>over the context of a non-realtime thread, a fault will occur. > >>>> > >>> > >>>Yes, it's a minor fault occurring due to on-demand memory mapping, this > >>>is why we don't get any alarming message in the kernel log. > >>> > >> > >>Looks like it's something that should never happen, for sure. > > > > > > Now that vmalloc & ioremap memory may have their pte set on demand anew > > due to the nocow patch, minor faults in kernel space are possible again, > > but this should only happen on behalf of the Linux domain, this is not > > expected to happen in primary mode. > > Does not a primary mode IRQ handler borrow the mmu context from the > tasks it preempts ? > Yes, this is where the problem stands if we happen to preempt a regular task and tread over code which might trigger minor faults. The best way to check this would be to somehow enable VM_PINNED for all tasks. Back to square #1. -- Philippe. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Xenomai-core] [BUG] trunk: screwed Linux irq state 2007-02-12 15:10 ` Philippe Gerum @ 2007-02-12 19:02 ` Gilles Chanteperdrix 0 siblings, 0 replies; 17+ messages in thread From: Gilles Chanteperdrix @ 2007-02-12 19:02 UTC (permalink / raw) To: rpm; +Cc: Jan Kiszka, xenomai-core Philippe Gerum wrote: > On Mon, 2007-02-12 at 15:39 +0100, Gilles Chanteperdrix wrote: > >>Philippe Gerum wrote: >> >>>On Mon, 2007-02-12 at 14:49 +0100, Jan Kiszka wrote: >>> >>> >>>>Philippe Gerum wrote: >>>> >>>> >>>>>On Mon, 2007-02-12 at 14:16 +0100, Gilles Chanteperdrix wrote: >>>>> >>>>> >>>>>>Jan Kiszka wrote: >>>>>> >>>>>> >>>>>>>Jan Kiszka wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>>2.6.19 didn't magically start to work as well. Instead I have a back >>>>>>>>trace now, see attachment. >>>>>>>> >>>>>>>>I included a full set of 16k points, but the thrilling things are around >>>>>>>>-73 to -25: Some Linux process with IRQs on gets preempted by an RT-IRQ >>>>>>>>(RTnet NIC). That triggers an RT kernel thread to run for a while (RTnet >>>>>>>>stack manager, prio 98). But when returning to Linux again, its IRQs >>>>>>>>remain masked now. The reason must be that weird exception at -62. Don't >>>>>>>>know where it comes from and why is there no report about THAT issue in >>>>>>>>the kernel logs. >>>>>>> >>>>>>>The cause of this page fault will get tracked down later today, but the >>>>>>>way it is handled already causes some doubts to me. To make discussion >>>>>>>easier, here is the relevant excerpt from the trace: >>>>>> >>>>>>Maybe this fault is due to the No-cow patch ? Before the no-cow patch, >>>>>>vmalloced areas were added to all processes page directories, now they >>>>>>are added only to the page directories of processes with the VM_PINNED >>>>>>flag. So, if ipipe_test_root tries to access some module memory area >>>>>>over the context of a non-realtime thread, a fault will occur. >>>>>> >>>>> >>>>>Yes, it's a minor fault occurring due to on-demand memory mapping, this >>>>>is why we don't get any alarming message in the kernel log. >>>>> >>>> >>>>Looks like it's something that should never happen, for sure. >>> >>> >>>Now that vmalloc & ioremap memory may have their pte set on demand anew >>>due to the nocow patch, minor faults in kernel space are possible again, >>>but this should only happen on behalf of the Linux domain, this is not >>>expected to happen in primary mode. >> >>Does not a primary mode IRQ handler borrow the mmu context from the >>tasks it preempts ? >> > > > Yes, this is where the problem stands if we happen to preempt a regular > task and tread over code which might trigger minor faults. The best way > to check this would be to somehow enable VM_PINNED for all tasks. Back > to square #1. > Ok. I'll try to change this and send a patch ASAP. -- Gilles Chanteperdrix ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2007-02-12 19:02 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-02-11 22:13 [Xenomai-core] [BUG] trunk: screwed Linux irq state Jan Kiszka 2007-02-11 22:26 ` Gilles Chanteperdrix 2007-02-11 22:31 ` Gilles Chanteperdrix 2007-02-11 22:42 ` Philippe Gerum 2007-02-11 23:07 ` Gilles Chanteperdrix 2007-02-11 23:49 ` Philippe Gerum 2007-02-12 0:20 ` Gilles Chanteperdrix 2007-02-12 0:28 ` Jan Kiszka 2007-02-12 1:10 ` Jan Kiszka 2007-02-12 11:49 ` Jan Kiszka 2007-02-12 13:16 ` Gilles Chanteperdrix 2007-02-12 13:46 ` Philippe Gerum 2007-02-12 13:49 ` Jan Kiszka 2007-02-12 14:10 ` Philippe Gerum 2007-02-12 14:39 ` Gilles Chanteperdrix 2007-02-12 15:10 ` Philippe Gerum 2007-02-12 19:02 ` Gilles Chanteperdrix
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.