All of lore.kernel.org
 help / color / mirror / Atom feed
* set_fiq_handler: Bad mode in data abort handler detected
       [not found] <2527501.cXAbiV8bqS@dabox>
@ 2014-04-24 10:31 ` Russell King - ARM Linux
  2014-04-24 11:57   ` Tim Sander
  2014-04-24 14:33   ` Tim Sander
  0 siblings, 2 replies; 10+ messages in thread
From: Russell King - ARM Linux @ 2014-04-24 10:31 UTC (permalink / raw)
  To: linux-arm-kernel

Please address kernel related problems to the linux-arm-kernel mailing
list in preference to linux-arm.  Thanks.

On Thu, Apr 24, 2014 at 11:46:15AM +0200, Tim Sander wrote:
> I have installed a FIQ handler with set_fiq_handler on an Xilinx Zynq. 
> I had to enable the the FIQ symbol in kconfig for the Zynq as its not enabled
> by default. As i was not able to boot a mainline kernel i used the 3.12 kernel
> of the xilinx repository at github. But as there are no changes in the FIQ handler
> stuff i guess that does not matter. The Zynq is a dual ArmV7 Cortex A9.
> The handler works for an random timespan and then i see:

The first rule of FIQs is that they are not permitted to cause any
aborts what so ever - any aborts can be fatal as they can cause
deadlock.

> Bad mode in data abort handler detected
> Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
> Modules linked in: firq(O) ipv6
> CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.12.0-xilinx-dirty #54
> task: c05bd420 ti: c05b2000 task.ti: c05b2000
> PC is at 0xffff1224
> LR is at arch_cpu_idle+0x20/0x2c
> pc : [<ffff1224>]    lr : [<c000f344>]    psr: 600e01d1
> sp : c05b3f70  ip : 00000000  fp : 00000000
> r10: 00000000  r9 : 413fc090  r8 : c0a264c0
> r7 : c05a7720  r6 : c04080c8  r5 : c05f2500  r4 : c05b2000
> r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : c0a299f8
> Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment kernel
> Control: 18c5387d  Table: 1ec0404a  DAC: 00000015
> Process swapper/0 (pid: 0, stack limit = 0xc05b2240)
> Stack: (0xc05b3f70 to 0xc05b4000)
> 3f60:                                     c0a299f8 00000000 00000000 00000000
> 3f80: c05b2000 c05f2500 c04080c8 c05a7720 c0a264c0 413fc090 00000000 00000000
> 3fa0: 00000000 c05b3f70 c000f344 ffff1224 600e01d1 ffffffff 00000000 c0055fb8
> 3fc0: c040a7b0 c0584a5c ffffffff ffffffff c0584574 00000000 00000000 c05a7720
> 3fe0: 18c5387d c05ba3cc c05a771c c05be440 0000406a 00008074 00000000 00000000
> [<c000f344>] (arch_cpu_idle+0x20/0x2c) from [<00000000>] (  (null))
> Code: e320f000 e320f000 e320f000 eafffffe (e5889000) 

The faulting instruction was:

	str r9, [r8]

However, the register dump above does not include the FIQ banked registers,
so we don't actually know what r8 was.

> My first guess would be that i had a cache page miss in the fiq handler?

Yes.

> I guess the best way would be putting the fiq-handler on the On Chip
> Memory but then i would still have the same problem that the code jumping
> to the OCM would have a cache miss?

I'm guessing that the address pointed to by r8 (the timer base) is
ioremapped after other threads are already started?  The problem with
that is other threads won't have the L1 page table pointers for these
mappings - we populate these lazily because trying to do it at
ioremap() time would be extremely painful.

What might be possible is to have a function which can be called in
these circumstances which ensures that a kernel address is accessible
to all threads in the system, though while it's doing that, it would
have to stop any fork() or exit() activity to be sure that it updated
every thread.

In years gone by, I'd have recommended that the kernel mappings for
this stuff were done via static mappings, but with DT, that's no
longer acceptable.  So I guess we have a problem...

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* set_fiq_handler: Bad mode in data abort handler detected
  2014-04-24 10:31 ` set_fiq_handler: Bad mode in data abort handler detected Russell King - ARM Linux
@ 2014-04-24 11:57   ` Tim Sander
  2014-04-24 14:33   ` Tim Sander
  1 sibling, 0 replies; 10+ messages in thread
From: Tim Sander @ 2014-04-24 11:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell

Am Donnerstag, 24. April 2014, 11:31:37 schrieb Russell King - ARM Linux:
> Please address kernel related problems to the linux-arm-kernel mailing
> list in preference to linux-arm.  Thanks.
Sorry, i thought linux-arm was the right list.

> On Thu, Apr 24, 2014 at 11:46:15AM +0200, Tim Sander wrote:
> > I have installed a FIQ handler with set_fiq_handler on an Xilinx Zynq.
> > I had to enable the the FIQ symbol in kconfig for the Zynq as its not
> > enabled by default. As i was not able to boot a mainline kernel i used
> > the 3.12 kernel of the xilinx repository at github. But as there are no
> > changes in the FIQ handler stuff i guess that does not matter. The Zynq
> > is a dual ArmV7 Cortex A9.
> > The handler works for an random timespan and then i see:
> The first rule of FIQs is that they are not permitted to cause any
> aborts what so ever - any aborts can be fatal as they can cause
> deadlock.
> 
> > Bad mode in data abort handler detected
> > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
> > Modules linked in: firq(O) ipv6
> > CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.12.0-xilinx-dirty
> > #54 task: c05bd420 ti: c05b2000 task.ti: c05b2000
> > PC is at 0xffff1224
> > LR is at arch_cpu_idle+0x20/0x2c
> > pc : [<ffff1224>]    lr : [<c000f344>]    psr: 600e01d1
> > sp : c05b3f70  ip : 00000000  fp : 00000000
> > r10: 00000000  r9 : 413fc090  r8 : c0a264c0
> > r7 : c05a7720  r6 : c04080c8  r5 : c05f2500  r4 : c05b2000
> > r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : c0a299f8
> > Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment kernel
> > Control: 18c5387d  Table: 1ec0404a  DAC: 00000015
> > Process swapper/0 (pid: 0, stack limit = 0xc05b2240)
> > Stack: (0xc05b3f70 to 0xc05b4000)
> > 3f60:                                     c0a299f8 00000000 00000000
> > 00000000 3f80: c05b2000 c05f2500 c04080c8 c05a7720 c0a264c0 413fc090
> > 00000000 00000000 3fa0: 00000000 c05b3f70 c000f344 ffff1224 600e01d1
> > ffffffff 00000000 c0055fb8 3fc0: c040a7b0 c0584a5c ffffffff ffffffff
> > c0584574 00000000 00000000 c05a7720 3fe0: 18c5387d c05ba3cc c05a771c
> > c05be440 0000406a 00008074 00000000 00000000 [<c000f344>]
> > (arch_cpu_idle+0x20/0x2c) from [<00000000>] (  (null)) Code: e320f000
> > e320f000 e320f000 eafffffe (e5889000)
> 
> The faulting instruction was:
> 
> 	str r9, [r8]
r8 indeed points to a ioremapped address.
> However, the register dump above does not include the FIQ banked registers,
> so we don't actually know what r8 was.
> 
> > My first guess would be that i had a cache page miss in the fiq handler?
> 
> Yes.
> 
> > I guess the best way would be putting the fiq-handler on the On Chip
> > Memory but then i would still have the same problem that the code jumping
> > to the OCM would have a cache miss?
> 
> I'm guessing that the address pointed to by r8 (the timer base) is
> ioremapped after other threads are already started?  
Yes: For testing purposes i wrote a kernel module which insmod'ed into the 
kernel. To the ioremap for this address is surely executed after the kernel 
threads are started (which i guess is you mean with other threads).
> The problem with
> that is other threads won't have the L1 page table pointers for these
> mappings - we populate these lazily because trying to do it at
> ioremap() time would be extremely painful.
So the success of the fiq interrupt depends on the context of the kernel thread
running (or more precisely of the L1 page table pointers of that particular 
thread) when the FIQ hits?

> What might be possible is to have a function which can be called in
> these circumstances which ensures that a kernel address is accessible
> to all threads in the system, though while it's doing that, it would
> have to stop any fork() or exit() activity to be sure that it updated
> every thread.
Well, as this would be at the time of the FIQ installment, where timing is not 
yet critical, that should at least work for this usecase.

> In years gone by, I'd have recommended that the kernel mappings for
> this stuff were done via static mappings, but with DT, that's no
> longer acceptable.  So I guess we have a problem...
Oh my, i didn't meant to open a can of worms.

Best regards
Tim

^ permalink raw reply	[flat|nested] 10+ messages in thread

* set_fiq_handler: Bad mode in data abort handler detected
  2014-04-24 10:31 ` set_fiq_handler: Bad mode in data abort handler detected Russell King - ARM Linux
  2014-04-24 11:57   ` Tim Sander
@ 2014-04-24 14:33   ` Tim Sander
  2014-04-24 19:01     ` Russell King - ARM Linux
  1 sibling, 1 reply; 10+ messages in thread
From: Tim Sander @ 2014-04-24 14:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell and List
<snip>
> In years gone by, I'd have recommended that the kernel mappings for
> this stuff were done via static mappings, but with DT, that's no
> longer acceptable.  So I guess we have a problem...
To verify that your very plausible hypothesis is right i tried:
timer_memory = __arm_ioremap(0x4280000>>PAGE_SHIFT,0x1000,MT_MEMORY); //also tried MT_DEVICE

the memory at early boot in "zynq_init_late". But this fails and gives 
the following error:

WARNING: CPU: 0 PID: 1 at arch/arm/mm/ioremap.c:301 __arm_ioremap_pfn_caller+0x100/0x184()
which seems to be the only WARN_ON which shows that the pfn is invalid.

Any hints why this call to __arm_ioremap fails?

Also i tried to map with ioremap_nocache during module load but i 
guess this information also gets propagated lazy so it also didn't
work.

Best regards
Tim

^ permalink raw reply	[flat|nested] 10+ messages in thread

* set_fiq_handler: Bad mode in data abort handler detected
  2014-04-24 14:33   ` Tim Sander
@ 2014-04-24 19:01     ` Russell King - ARM Linux
  2014-04-25 13:36       ` Tim Sander
  0 siblings, 1 reply; 10+ messages in thread
From: Russell King - ARM Linux @ 2014-04-24 19:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 24, 2014 at 04:33:38PM +0200, Tim Sander wrote:
> Hi Russell and List
> <snip>
> > In years gone by, I'd have recommended that the kernel mappings for
> > this stuff were done via static mappings, but with DT, that's no
> > longer acceptable.  So I guess we have a problem...
> 
> To verify that your very plausible hypothesis is right i tried:
> timer_memory = __arm_ioremap(0x4280000>>PAGE_SHIFT,0x1000,MT_MEMORY); //also tried MT_DEVICE

This isn't going to help.  Any dynamically initialised mapping via any
of the ioremap functions is going to fail for the reason I outlined,
and it doesn't matter what type of mapping you use.  *All* dynamically
created mappings are populated to other threads lazily.

The reason for that is because it's _very_ expensive/racy to walk over
every single thread and update its page tables - Linux years ago used
to do that as standard with ioremap() and similar, and the code was
ripped out after it became too much of a burden.

When I talk about static mappings above, I'm talking about those which
are setup very early in boot via iotable_init().  However, these aren't
permitted with DT anymore.

> the memory at early boot in "zynq_init_late".

My kernel doesn't have zynq_init_late()... I'm guessing that it's hooked
into the .init_late callback, which is certainly too late - this is called
towards the end of driver initialisation, after many threads have already
been spawned.

The places where you are called before any threads have been spawned are
unfortunately places where you can't use ioremap().

At the moment, I don't have an answer to this - the answers I have are
incompatible with the direction that arm-soc people want to go (which is
to have zero static mappings.)

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* set_fiq_handler: Bad mode in data abort handler detected
  2014-04-24 19:01     ` Russell King - ARM Linux
@ 2014-04-25 13:36       ` Tim Sander
  2014-04-25 13:51         ` Russell King - ARM Linux
  0 siblings, 1 reply; 10+ messages in thread
From: Tim Sander @ 2014-04-25 13:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell and List

Thanks for your feedback!
Am Donnerstag, 24. April 2014, 20:01:56 schrieb Russell King - ARM Linux:
> > > In years gone by, I'd have recommended that the kernel mappings for
> > > this stuff were done via static mappings, but with DT, that's no
> > > longer acceptable.  So I guess we have a problem...
> > 
> > To verify that your very plausible hypothesis is right i tried:
> > timer_memory = __arm_ioremap(0x4280000>>PAGE_SHIFT,0x1000,MT_MEMORY);
> > //also tried MT_DEVICE
> This isn't going to help.  Any dynamically initialised mapping via any
> of the ioremap functions is going to fail for the reason I outlined,
> and it doesn't matter what type of mapping you use.  *All* dynamically
> created mappings are populated to other threads lazily.
Ok, i tried mapping statically in bootup. Just to verify and understand the 
problem. It seems to help somewhat (probably it does go into more threads), 
but it doesn't remedy the problem completly:

static struct map_desc zynq_axi_gp0 __initdata = {
    .virtual   = 0xe4000000, //FIXME just arbitrary, which?
    .pfn    = __phys_to_pfn(0x40000000),
    .length = SZ_128M,
    .type   = MT_DEVICE,
};

static void __init zynq_axi_gp_init(void)
{
    iotable_init(&zynq_axi_gp0,1);
    zynq_axi_gp0_base = (void __iomem *) zynq_axi_gp0.virtual;
    BUG_ON(!zynq_axi_gp0_base);
}
This was called in the .map_io callback. But it seems, even this is to late to
propagate into all threads. Calling it earlier does not work (e.g. .init_early 
,.init_timer or init_irq)...
Thinking about it, if its truly lazy even an early initialization does not 
help if mapping synchronisation is allways done lazy via data abort.

> The reason for that is because it's _very_ expensive/racy to walk over
> every single thread and update its page tables - Linux years ago used
> to do that as standard with ioremap() and similar, and the code was
> ripped out after it became too much of a burden.
It seems as if this was before git times? At least it does not seem to be 
in the git repository. Do you have an rough estimate in what year that was or 
which kernel version?

> When I talk about static mappings above, I'm talking about those which
> are setup very early in boot via iotable_init().  However, these aren't
> permitted with DT anymore.
As pointed out above this call at least boots and works in a way that i see 
the ioremapped virtual address used (0xe4000000).
> > the memory at early boot in "zynq_init_late".
> 
> My kernel doesn't have zynq_init_late()... I'm guessing that it's hooked
> into the .init_late callback, which is certainly too late - this is called
> towards the end of driver initialisation, after many threads have already
> been spawned.
Is there an callback where iotable_init still works and that is early enough?

> The places where you are called before any threads have been spawned are
> unfortunately places where you can't use ioremap().
> 
> At the moment, I don't have an answer to this - the answers I have are
> incompatible with the direction that arm-soc people want to go (which is
> to have zero static mappings.)
Ok, your wrote in the earlier mail:
>What might be possible is to have a function which can be called in
>these circumstances which ensures that a kernel address is accessible
>to all threads in the system, though while it's doing that, it would
>have to stop any fork() or exit() activity to be sure that it updated
>every thread.
Would a solution that works that way be acceptable for mainline? 

Besides that i currently don't understand why the FIQ worked on older pre 
CortexA9 cores with Linux? There is an nice writeup at 
http://free-electrons.com/blog/fiq-handlers-in-the-arm-linux-kernel/ 
which is working with an armV5 (which has the caches on the "wrong" side) and 
i think that it was also working on armV6 (aka arm1136).

Best regards
Tim

^ permalink raw reply	[flat|nested] 10+ messages in thread

* set_fiq_handler: Bad mode in data abort handler detected
  2014-04-25 13:36       ` Tim Sander
@ 2014-04-25 13:51         ` Russell King - ARM Linux
  2014-05-12  7:02             ` Tim Sander
  0 siblings, 1 reply; 10+ messages in thread
From: Russell King - ARM Linux @ 2014-04-25 13:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 25, 2014 at 03:36:48PM +0200, Tim Sander wrote:
> Hi Russell and List
> 
> Thanks for your feedback!
> Am Donnerstag, 24. April 2014, 20:01:56 schrieb Russell King - ARM Linux:
> > > > In years gone by, I'd have recommended that the kernel mappings for
> > > > this stuff were done via static mappings, but with DT, that's no
> > > > longer acceptable.  So I guess we have a problem...
> > > 
> > > To verify that your very plausible hypothesis is right i tried:
> > > timer_memory = __arm_ioremap(0x4280000>>PAGE_SHIFT,0x1000,MT_MEMORY);
> > > //also tried MT_DEVICE
> > This isn't going to help.  Any dynamically initialised mapping via any
> > of the ioremap functions is going to fail for the reason I outlined,
> > and it doesn't matter what type of mapping you use.  *All* dynamically
> > created mappings are populated to other threads lazily.
> Ok, i tried mapping statically in bootup. Just to verify and understand the 
> problem. It seems to help somewhat (probably it does go into more threads), 
> but it doesn't remedy the problem completly:
> 
> static struct map_desc zynq_axi_gp0 __initdata = {
>     .virtual   = 0xe4000000, //FIXME just arbitrary, which?
>     .pfn    = __phys_to_pfn(0x40000000),
>     .length = SZ_128M,
>     .type   = MT_DEVICE,
> };
> 
> static void __init zynq_axi_gp_init(void)
> {
>     iotable_init(&zynq_axi_gp0,1);
>     zynq_axi_gp0_base = (void __iomem *) zynq_axi_gp0.virtual;
>     BUG_ON(!zynq_axi_gp0_base);
> }
> This was called in the .map_io callback. But it seems, even this is to late to
> propagate into all threads. Calling it earlier does not work (e.g. .init_early 
> ,.init_timer or init_irq)...

It isn't too late.  .map_io is called as part of the very early kernel
initialisation, when the page tables are being setup with real mappings
for the very first time.  There's no interrupts, no real memory allocators,
in fact not much of anything at that point.

I'm afraid that I'm no longer that knowledgeable about whether ioremap
will take account of this stuff or not - other people have been hacking
in this area and my knowledge is outdated.

> Thinking about it, if its truly lazy even an early initialization does not 
> help if mapping synchronisation is allways done lazy via data abort.

This is how it works.

.map_io is called with the init_mm as the current mm structure.  This
contains the page tables.  Calling iotable_init() sets up mappings in
that page table.  No other threads exist at this point.

When a kernel thread is spawned, all L1 page tables for kernel mappings
are copied to the child's page tables.  Therefore, the mappings setup
via iotable_init() will propagate into the children without any data
aborts.

On ioremap(), the init_mm's page tables are updated with the L1 entries.
Other page tables are not updated until an access is performed, which
causes a data abort if there is no L1 page table entry.

So, .map_io should resolve the problem.  If it doesn't, something else
is going on - maybe ioremap() is trampling all over your static mappings...
though I thought we put the iotable_init()-created mappings into the
vmalloc list, which should prevent it.  I don't know anymore...

> > The reason for that is because it's _very_ expensive/racy to walk over
> > every single thread and update its page tables - Linux years ago used
> > to do that as standard with ioremap() and similar, and the code was
> > ripped out after it became too much of a burden.
>
> It seems as if this was before git times? At least it does not seem to be 
> in the git repository. Do you have an rough estimate in what year that was or 
> which kernel version?

Yes, way before a very long time ago, probably 1.2 or 2.0 kernel time
(or their development counterparts.)

I'm sorry, I don't think I can really help anymore with this problem.
I've given you the best that my limited knowledge of the ARM kernel
today allows, which is reducing as I don't really hack on the ARM kernel
very much anymore, and I'm not involved with many of the changes which
happen today.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: set_fiq_handler: Bad mode in data abort handler detected (mmu translation fault)
  2014-04-25 13:51         ` Russell King - ARM Linux
@ 2014-05-12  7:02             ` Tim Sander
  0 siblings, 0 replies; 10+ messages in thread
From: Tim Sander @ 2014-05-12  7:02 UTC (permalink / raw)
  To: Russell King - ARM Linux; +Cc: linux-arm-kernel, linux-mm

Hi

I am still hunting the mmu faults during FIQ. But i have some new information
which seem to warrant a new mail. But first for reference the thread start:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/250196.html
as i am also cc'ing linux-mm as this seems also concerning mm.

Am Freitag, 25. April 2014, 14:51:18 schrieb Russell King - ARM Linux:
> On Fri, Apr 25, 2014 at 03:36:48PM +0200, Tim Sander wrote:
> > Hi Russell and List
> > 
> > Thanks for your feedback!
> > 
> > Am Donnerstag, 24. April 2014, 20:01:56 schrieb Russell King - ARM Linux:
> > > > > In years gone by, I'd have recommended that the kernel mappings for
> > > > > this stuff were done via static mappings, but with DT, that's no
> > > > > longer acceptable.  So I guess we have a problem...
> > > > 
> > > > To verify that your very plausible hypothesis is right i tried:
> > > > timer_memory = __arm_ioremap(0x4280000>>PAGE_SHIFT,0x1000,MT_MEMORY);
> > > > //also tried MT_DEVICE
> > > 
> > > This isn't going to help.  Any dynamically initialised mapping via any
> > > of the ioremap functions is going to fail for the reason I outlined,
> > > and it doesn't matter what type of mapping you use.  *All* dynamically
> > > created mappings are populated to other threads lazily.
> > 
> > Ok, i tried mapping statically in bootup. Just to verify and understand
> > the
> > problem. It seems to help somewhat (probably it does go into more
> > threads),
> > but it doesn't remedy the problem completly:
> > 
> > static struct map_desc zynq_axi_gp0 __initdata = {
> > 
> >     .virtual   = 0xe4000000, //FIXME just arbitrary, which?
> >     .pfn    = __phys_to_pfn(0x40000000),
> >     .length = SZ_128M,
> >     .type   = MT_DEVICE,
> > 
> > };
> > 
> > static void __init zynq_axi_gp_init(void)
> > {
> > 
> >     iotable_init(&zynq_axi_gp0,1);
> >     zynq_axi_gp0_base = (void __iomem *) zynq_axi_gp0.virtual;
> >     BUG_ON(!zynq_axi_gp0_base);
> > 
> > }
> > This was called in the .map_io callback. But it seems, even this is to
> > late to propagate into all threads. Calling it earlier does not work
> > (e.g. .init_early ,.init_timer or init_irq)...
> 
> It isn't too late.  .map_io is called as part of the very early kernel
> initialisation, when the page tables are being setup with real mappings
> for the very first time.  There's no interrupts, no real memory allocators,
> in fact not much of anything at that point.
> 
> I'm afraid that I'm no longer that knowledgeable about whether ioremap
> will take account of this stuff or not - other people have been hacking
> in this area and my knowledge is outdated.
> 
> > Thinking about it, if its truly lazy even an early initialization does not
> > help if mapping synchronisation is allways done lazy via data abort.
> 
> This is how it works.
> 
> .map_io is called with the init_mm as the current mm structure.  This
> contains the page tables.  Calling iotable_init() sets up mappings in
> that page table.  No other threads exist at this point.
> 
> When a kernel thread is spawned, all L1 page tables for kernel mappings
> are copied to the child's page tables.  Therefore, the mappings setup
> via iotable_init() will propagate into the children without any data
> aborts.
> 
> On ioremap(), the init_mm's page tables are updated with the L1 entries.
> Other page tables are not updated until an access is performed, which
> causes a data abort if there is no L1 page table entry.
> 
> So, .map_io should resolve the problem.  If it doesn't, something else
> is going on - maybe ioremap() is trampling all over your static mappings...
> though I thought we put the iotable_init()-created mappings into the
> vmalloc list, which should prevent it.  I don't know anymore...
I did an prefaulting for each available processes:
    for_each_process(process)
    {
        printk("process: %s [%d]\n",process->comm,process->pid);
        if(process->mm) {
            switch_mm(old_process->mm,process->mm,process);
            ioread32(priv->my_hardware);   // access the memory, prefault mmu
            old_process = process;
        }
    }
but still i get the the "Bad mode in data abort":
Bad mode in data abort handler detected
Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
Modules linked in: firq(O+) ipv6
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.12.0-xilinx-00005-gc9455c0-dirty #97
task: c05cb420 ti: c05c0000 task.ti: c05c0000
PC is at 0xe3fc0000
LR is at arch_cpu_idle+0x20/0x2c
pc : [<e3fc0000>]    lr : [<c000f344>]    psr: 600701d1
sp : c05c1f70  ip : 00000000  fp : 00000000
r10: 00000000  r9 : 413fc090  r8 : c0a7b4c0
r7 : c05b6088  r6 : c0412348  r5 : c06008c0  r4 : c05c0000
r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : c0a7e9f8
Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment kernel
Control: 18c5387d  Table: 1ec2c04a  DAC: 00000015
Process swapper/0 (pid: 0, stack limit = 0xc05c0240)
Stack: (0xc05c1f70 to 0xc05c2000)
1f60:                                     c0a7e9f8 00000000 00000000 00000000
1f80: c05c0000 c06008c0 c0412348 c05b6088 c0a7b4c0 413fc090 00000000 00000000
1fa0: 00000000 c05c1f70 c000f344 e3fc0000 600701d1 ffffffff 00000000 c0056748
1fc0: c0414a30 c0592a60 ffffffff ffffffff c0592574 00000000 00000000 c05b6088
1fe0: 18c5387d c05c83cc c05b6084 c05cc440 0000406a 00008074 00000000 00000000
[<c000f344>] (arch_cpu_idle+0x20/0x2c) from [<00000000>] (  (null))
Code: bad PC value
---[ end trace 38f263d4b2076bcb ]---

But then i realized that its always swapper/0 which is faulting. But i don't see a pid 0 process
in my for_each_process loop. So i tried some special handling for pid 0 to also prefault it:

    process = pid_task(&init_struct_pid, PIDTYPE_PID);
    if(process) {
        printk("process: %s [%d]\n",process->comm,process->pid);
        switch_mm(current_task->mm,process->mm,process);
        ioread32(priv->my_hardware);  // access the memory, prefault mmu
        switch_mm(process->mm,current_task->mm,current_task);
    } else printk("process pid prefault failed\n");  //<this path is taken

But it seems that the scheduler pid struct has no process associated. So its 
not possible to get the mmu_struct for the pid 0. The structure can't be 
implicit or otherwise there should be an mmu entry due to the prefaulting done
or due to the static mapping. So it seems there is an MMU table which is not
associated with any process and is used during scheduler/swapper work...
but where is it hiding?

I am sure that the error seen is a mmu translation fault as the IFSR bits of the
DFSR show 00101 or 00111 which is a mmu translation fault for section or page.
I have also verified the address accessed by the fiq handler routine accesses 
my_hardware. Also the fact that the handler is working *most* of the time fits 
well to the mmu translation fault.

Another interesting fact is that if the interrupt rate is slower (e.g. 1 second), i see
this problem if it is faster (probably Kernel HZ(?), but hard to tell as the error is not 
deterministic) they seem to go away. 

Best regards
Tim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* set_fiq_handler: Bad mode in data abort handler detected (mmu translation fault)
@ 2014-05-12  7:02             ` Tim Sander
  0 siblings, 0 replies; 10+ messages in thread
From: Tim Sander @ 2014-05-12  7:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi

I am still hunting the mmu faults during FIQ. But i have some new information
which seem to warrant a new mail. But first for reference the thread start:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/250196.html
as i am also cc'ing linux-mm as this seems also concerning mm.

Am Freitag, 25. April 2014, 14:51:18 schrieb Russell King - ARM Linux:
> On Fri, Apr 25, 2014 at 03:36:48PM +0200, Tim Sander wrote:
> > Hi Russell and List
> > 
> > Thanks for your feedback!
> > 
> > Am Donnerstag, 24. April 2014, 20:01:56 schrieb Russell King - ARM Linux:
> > > > > In years gone by, I'd have recommended that the kernel mappings for
> > > > > this stuff were done via static mappings, but with DT, that's no
> > > > > longer acceptable.  So I guess we have a problem...
> > > > 
> > > > To verify that your very plausible hypothesis is right i tried:
> > > > timer_memory = __arm_ioremap(0x4280000>>PAGE_SHIFT,0x1000,MT_MEMORY);
> > > > //also tried MT_DEVICE
> > > 
> > > This isn't going to help.  Any dynamically initialised mapping via any
> > > of the ioremap functions is going to fail for the reason I outlined,
> > > and it doesn't matter what type of mapping you use.  *All* dynamically
> > > created mappings are populated to other threads lazily.
> > 
> > Ok, i tried mapping statically in bootup. Just to verify and understand
> > the
> > problem. It seems to help somewhat (probably it does go into more
> > threads),
> > but it doesn't remedy the problem completly:
> > 
> > static struct map_desc zynq_axi_gp0 __initdata = {
> > 
> >     .virtual   = 0xe4000000, //FIXME just arbitrary, which?
> >     .pfn    = __phys_to_pfn(0x40000000),
> >     .length = SZ_128M,
> >     .type   = MT_DEVICE,
> > 
> > };
> > 
> > static void __init zynq_axi_gp_init(void)
> > {
> > 
> >     iotable_init(&zynq_axi_gp0,1);
> >     zynq_axi_gp0_base = (void __iomem *) zynq_axi_gp0.virtual;
> >     BUG_ON(!zynq_axi_gp0_base);
> > 
> > }
> > This was called in the .map_io callback. But it seems, even this is to
> > late to propagate into all threads. Calling it earlier does not work
> > (e.g. .init_early ,.init_timer or init_irq)...
> 
> It isn't too late.  .map_io is called as part of the very early kernel
> initialisation, when the page tables are being setup with real mappings
> for the very first time.  There's no interrupts, no real memory allocators,
> in fact not much of anything at that point.
> 
> I'm afraid that I'm no longer that knowledgeable about whether ioremap
> will take account of this stuff or not - other people have been hacking
> in this area and my knowledge is outdated.
> 
> > Thinking about it, if its truly lazy even an early initialization does not
> > help if mapping synchronisation is allways done lazy via data abort.
> 
> This is how it works.
> 
> .map_io is called with the init_mm as the current mm structure.  This
> contains the page tables.  Calling iotable_init() sets up mappings in
> that page table.  No other threads exist at this point.
> 
> When a kernel thread is spawned, all L1 page tables for kernel mappings
> are copied to the child's page tables.  Therefore, the mappings setup
> via iotable_init() will propagate into the children without any data
> aborts.
> 
> On ioremap(), the init_mm's page tables are updated with the L1 entries.
> Other page tables are not updated until an access is performed, which
> causes a data abort if there is no L1 page table entry.
> 
> So, .map_io should resolve the problem.  If it doesn't, something else
> is going on - maybe ioremap() is trampling all over your static mappings...
> though I thought we put the iotable_init()-created mappings into the
> vmalloc list, which should prevent it.  I don't know anymore...
I did an prefaulting for each available processes:
    for_each_process(process)
    {
        printk("process: %s [%d]\n",process->comm,process->pid);
        if(process->mm) {
            switch_mm(old_process->mm,process->mm,process);
            ioread32(priv->my_hardware);   // access the memory, prefault mmu
            old_process = process;
        }
    }
but still i get the the "Bad mode in data abort":
Bad mode in data abort handler detected
Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
Modules linked in: firq(O+) ipv6
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.12.0-xilinx-00005-gc9455c0-dirty #97
task: c05cb420 ti: c05c0000 task.ti: c05c0000
PC is at 0xe3fc0000
LR is at arch_cpu_idle+0x20/0x2c
pc : [<e3fc0000>]    lr : [<c000f344>]    psr: 600701d1
sp : c05c1f70  ip : 00000000  fp : 00000000
r10: 00000000  r9 : 413fc090  r8 : c0a7b4c0
r7 : c05b6088  r6 : c0412348  r5 : c06008c0  r4 : c05c0000
r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : c0a7e9f8
Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment kernel
Control: 18c5387d  Table: 1ec2c04a  DAC: 00000015
Process swapper/0 (pid: 0, stack limit = 0xc05c0240)
Stack: (0xc05c1f70 to 0xc05c2000)
1f60:                                     c0a7e9f8 00000000 00000000 00000000
1f80: c05c0000 c06008c0 c0412348 c05b6088 c0a7b4c0 413fc090 00000000 00000000
1fa0: 00000000 c05c1f70 c000f344 e3fc0000 600701d1 ffffffff 00000000 c0056748
1fc0: c0414a30 c0592a60 ffffffff ffffffff c0592574 00000000 00000000 c05b6088
1fe0: 18c5387d c05c83cc c05b6084 c05cc440 0000406a 00008074 00000000 00000000
[<c000f344>] (arch_cpu_idle+0x20/0x2c) from [<00000000>] (  (null))
Code: bad PC value
---[ end trace 38f263d4b2076bcb ]---

But then i realized that its always swapper/0 which is faulting. But i don't see a pid 0 process
in my for_each_process loop. So i tried some special handling for pid 0 to also prefault it:

    process = pid_task(&init_struct_pid, PIDTYPE_PID);
    if(process) {
        printk("process: %s [%d]\n",process->comm,process->pid);
        switch_mm(current_task->mm,process->mm,process);
        ioread32(priv->my_hardware);  // access the memory, prefault mmu
        switch_mm(process->mm,current_task->mm,current_task);
    } else printk("process pid prefault failed\n");  //<this path is taken

But it seems that the scheduler pid struct has no process associated. So its 
not possible to get the mmu_struct for the pid 0. The structure can't be 
implicit or otherwise there should be an mmu entry due to the prefaulting done
or due to the static mapping. So it seems there is an MMU table which is not
associated with any process and is used during scheduler/swapper work...
but where is it hiding?

I am sure that the error seen is a mmu translation fault as the IFSR bits of the
DFSR show 00101 or 00111 which is a mmu translation fault for section or page.
I have also verified the address accessed by the fiq handler routine accesses 
my_hardware. Also the fact that the handler is working *most* of the time fits 
well to the mmu translation fault.

Another interesting fact is that if the interrupt rate is slower (e.g. 1 second), i see
this problem if it is faster (probably Kernel HZ(?), but hard to tell as the error is not 
deterministic) they seem to go away. 

Best regards
Tim

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: set_fiq_handler: Bad mode in data abort handler detected (mmu translation fault)
  2014-05-12  7:02             ` Tim Sander
@ 2014-05-12 19:06               ` Nicolas Pitre
  -1 siblings, 0 replies; 10+ messages in thread
From: Nicolas Pitre @ 2014-05-12 19:06 UTC (permalink / raw)
  To: Tim Sander; +Cc: Russell King - ARM Linux, linux-mm, linux-arm-kernel

On Mon, 12 May 2014, Tim Sander wrote:

> I did an prefaulting for each available processes:
>     for_each_process(process)
>     {
>         printk("process: %s [%d]\n",process->comm,process->pid);
>         if(process->mm) {
>             switch_mm(old_process->mm,process->mm,process);
>             ioread32(priv->my_hardware);   // access the memory, prefault mmu
>             old_process = process;
>         }
>     }
> but still i get the the "Bad mode in data abort":
> Bad mode in data abort handler detected
> Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
> Modules linked in: firq(O+) ipv6
> CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.12.0-xilinx-00005-gc9455c0-dirty #97
> task: c05cb420 ti: c05c0000 task.ti: c05c0000
> PC is at 0xe3fc0000
> LR is at arch_cpu_idle+0x20/0x2c
> pc : [<e3fc0000>]    lr : [<c000f344>]    psr: 600701d1
> sp : c05c1f70  ip : 00000000  fp : 00000000
> r10: 00000000  r9 : 413fc090  r8 : c0a7b4c0
> r7 : c05b6088  r6 : c0412348  r5 : c06008c0  r4 : c05c0000
> r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : c0a7e9f8
> Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment kernel
> Control: 18c5387d  Table: 1ec2c04a  DAC: 00000015
> Process swapper/0 (pid: 0, stack limit = 0xc05c0240)
> Stack: (0xc05c1f70 to 0xc05c2000)
> 1f60:                                     c0a7e9f8 00000000 00000000 00000000
> 1f80: c05c0000 c06008c0 c0412348 c05b6088 c0a7b4c0 413fc090 00000000 00000000
> 1fa0: 00000000 c05c1f70 c000f344 e3fc0000 600701d1 ffffffff 00000000 c0056748
> 1fc0: c0414a30 c0592a60 ffffffff ffffffff c0592574 00000000 00000000 c05b6088
> 1fe0: 18c5387d c05c83cc c05b6084 c05cc440 0000406a 00008074 00000000 00000000
> [<c000f344>] (arch_cpu_idle+0x20/0x2c) from [<00000000>] (  (null))
> Code: bad PC value
> ---[ end trace 38f263d4b2076bcb ]---
> 
> But then i realized that its always swapper/0 which is faulting. But i don't see a pid 0 process
> in my for_each_process loop. So i tried some special handling for pid 0 to also prefault it:
> 
>     process = pid_task(&init_struct_pid, PIDTYPE_PID);
>     if(process) {
>         printk("process: %s [%d]\n",process->comm,process->pid);
>         switch_mm(current_task->mm,process->mm,process);
>         ioread32(priv->my_hardware);  // access the memory, prefault mmu
>         switch_mm(process->mm,current_task->mm,current_task);
>     } else printk("process pid prefault failed\n");  //<this path is taken
> 
> But it seems that the scheduler pid struct has no process associated. So its 
> not possible to get the mmu_struct for the pid 0. The structure can't be 
> implicit or otherwise there should be an mmu entry due to the prefaulting done
> or due to the static mapping. So it seems there is an MMU table which is not
> associated with any process and is used during scheduler/swapper work...
> but where is it hiding?

The mmu_struct for PID 0 is at &init_mm.  

Try: switch_mm(current_task->mm, &init_mm, NULL);


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* set_fiq_handler: Bad mode in data abort handler detected (mmu translation fault)
@ 2014-05-12 19:06               ` Nicolas Pitre
  0 siblings, 0 replies; 10+ messages in thread
From: Nicolas Pitre @ 2014-05-12 19:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 12 May 2014, Tim Sander wrote:

> I did an prefaulting for each available processes:
>     for_each_process(process)
>     {
>         printk("process: %s [%d]\n",process->comm,process->pid);
>         if(process->mm) {
>             switch_mm(old_process->mm,process->mm,process);
>             ioread32(priv->my_hardware);   // access the memory, prefault mmu
>             old_process = process;
>         }
>     }
> but still i get the the "Bad mode in data abort":
> Bad mode in data abort handler detected
> Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
> Modules linked in: firq(O+) ipv6
> CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.12.0-xilinx-00005-gc9455c0-dirty #97
> task: c05cb420 ti: c05c0000 task.ti: c05c0000
> PC is at 0xe3fc0000
> LR is at arch_cpu_idle+0x20/0x2c
> pc : [<e3fc0000>]    lr : [<c000f344>]    psr: 600701d1
> sp : c05c1f70  ip : 00000000  fp : 00000000
> r10: 00000000  r9 : 413fc090  r8 : c0a7b4c0
> r7 : c05b6088  r6 : c0412348  r5 : c06008c0  r4 : c05c0000
> r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : c0a7e9f8
> Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment kernel
> Control: 18c5387d  Table: 1ec2c04a  DAC: 00000015
> Process swapper/0 (pid: 0, stack limit = 0xc05c0240)
> Stack: (0xc05c1f70 to 0xc05c2000)
> 1f60:                                     c0a7e9f8 00000000 00000000 00000000
> 1f80: c05c0000 c06008c0 c0412348 c05b6088 c0a7b4c0 413fc090 00000000 00000000
> 1fa0: 00000000 c05c1f70 c000f344 e3fc0000 600701d1 ffffffff 00000000 c0056748
> 1fc0: c0414a30 c0592a60 ffffffff ffffffff c0592574 00000000 00000000 c05b6088
> 1fe0: 18c5387d c05c83cc c05b6084 c05cc440 0000406a 00008074 00000000 00000000
> [<c000f344>] (arch_cpu_idle+0x20/0x2c) from [<00000000>] (  (null))
> Code: bad PC value
> ---[ end trace 38f263d4b2076bcb ]---
> 
> But then i realized that its always swapper/0 which is faulting. But i don't see a pid 0 process
> in my for_each_process loop. So i tried some special handling for pid 0 to also prefault it:
> 
>     process = pid_task(&init_struct_pid, PIDTYPE_PID);
>     if(process) {
>         printk("process: %s [%d]\n",process->comm,process->pid);
>         switch_mm(current_task->mm,process->mm,process);
>         ioread32(priv->my_hardware);  // access the memory, prefault mmu
>         switch_mm(process->mm,current_task->mm,current_task);
>     } else printk("process pid prefault failed\n");  //<this path is taken
> 
> But it seems that the scheduler pid struct has no process associated. So its 
> not possible to get the mmu_struct for the pid 0. The structure can't be 
> implicit or otherwise there should be an mmu entry due to the prefaulting done
> or due to the static mapping. So it seems there is an MMU table which is not
> associated with any process and is used during scheduler/swapper work...
> but where is it hiding?

The mmu_struct for PID 0 is at &init_mm.  

Try: switch_mm(current_task->mm, &init_mm, NULL);


Nicolas

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-05-12 19:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <2527501.cXAbiV8bqS@dabox>
2014-04-24 10:31 ` set_fiq_handler: Bad mode in data abort handler detected Russell King - ARM Linux
2014-04-24 11:57   ` Tim Sander
2014-04-24 14:33   ` Tim Sander
2014-04-24 19:01     ` Russell King - ARM Linux
2014-04-25 13:36       ` Tim Sander
2014-04-25 13:51         ` Russell King - ARM Linux
2014-05-12  7:02           ` set_fiq_handler: Bad mode in data abort handler detected (mmu translation fault) Tim Sander
2014-05-12  7:02             ` Tim Sander
2014-05-12 19:06             ` Nicolas Pitre
2014-05-12 19:06               ` Nicolas Pitre

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.